How the Wellspoken Index is calculated

The Wellspoken Index is a 1000-point measurement of spoken communication. Each response a person records moves through a multi-stage pipeline that combines deterministic speech analysis with LLM evaluation. This page is the canonical reference for how the score is computed.

The six dimensions

Every recording is scored across six dimensions. The weights are fixed and reflect what experienced communication coaches actually weight when they evaluate a speaker. Structure carries the most weight because clear organization is the single biggest determinant of whether a listener follows you.

  • Structure

    250 pts · LLM evaluation (transcript)
    Sub-metricMaxWhat it measures
    Logical Sequence50Did ideas build in a sensible order
    Transitions50Are ideas connected to each other
    Signposting50Markers like 'first,' 'the main point is'
    Opening Quality50Did the response start with a clear point
    Closing Quality50Did the response land on a conclusion
  • Conciseness

    200 pts · LLM evaluation (transcript)
    Sub-metricMaxWhat it measures
    Word Choice100Using the right word for the moment
    Word Economy100Saying it in as few words as needed
  • Confidence

    150 pts · Multimodal LLM (audio + text when available)
    Sub-metricMaxWhat it measures
    Hedging Frequency50Use of 'I think,' 'sort of,' 'maybe,' 'I guess'
    Uptalk50Statements ending with rising intonation like questions
    Assertiveness50Ratio of direct statements to qualified ones
  • Pronunciation

    150 pts · Azure deterministic scoring + LLM feedback
    Sub-metricMaxWhat it measures
    Pronunciation Clarity150Overall clarity and correctness, scaled from Azure's word-level accuracy
  • Filler Rate

    150 pts · Deterministic formula + LLM feedback
    Sub-metricMaxWhat it measures
    Filler Frequency150Filler words (um, uh, like, you know) per minute, exponentially decayed
  • Pace

    100 pts · Deterministic formula + LLM feedback
    Sub-metricMaxWhat it measures
    Words Per Minute50Speaking rate relative to a professional target
    Pause Timing50Whether pauses break thought or add weight

Why this design

Different dimensions need different tools. Pronunciation accuracy is a well-defined acoustic problem with mature speech APIs that can measure it objectively. Structure and conciseness are judgment calls about meaning, which require an LLM with strong reading comprehension. The Wellspoken Index assigns each dimension to the tool that does the job best.

  • Deterministic where possible. Pronunciation, filler rate, and pace use math on measurable signals. The numbers are repeatable across runs and reviewers.
  • LLM where it must be. Structure and conciseness need a model that can understand the content. Each LLM call follows a versioned rubric so the scoring stays consistent over time.
  • Multimodal for delivery. Confidence uses audio when available so the score reflects tone and intonation, not just word choice. Text-only fallback is documented and bounded.

Pipeline

Each recording passes through three cloud providers depending on the dimension:

  1. Azure Speech SDK performs word-level pronunciation scoring and returns phoneme accuracy.
  2. Google Vertex AI / Gemini runs multimodal evaluation, with the audio track sent inline for the Confidence dimension.
  3. Google Cloud Storage holds source media when files exceed the inline transport limit.

Seven LLM calls run in parallel where possible. The system normalizes raw outputs into the dimension scores and aggregates them into the final 1000-point Index.

Reproducibility and fallbacks

Prompts are versioned. Rubrics are stored alongside the code. When a primary model fails or returns an unparseable response, the system falls through a documented fallback chain rather than silently producing a degraded score. Every Wellspoken Index value is traceable to the prompt version, model, and input audio that produced it.

Citation

If you reference the Wellspoken Index in research or in journalism, cite it as "the Wellspoken Index, a 1000-point scoring system for spoken communication developed by Wellspoken Labs Inc." and link tothis methodology page.

Press inquiries: press@wellspoken.me.

FAQs

  • How is the Wellspoken Index calculated?

    Each recording is scored across six dimensions on a 1000-point scale. Structure (250 points) and Conciseness (200 points) use LLM evaluation of the transcript. Confidence (150 points) uses a multimodal LLM with audio when available. Pronunciation (150 points) uses Azure's deterministic word-level speech scoring. Filler Rate (150 points) and Pace (100 points) use deterministic formulas with LLM-generated feedback.

  • Why is Structure weighted the highest?

    Clear organization is the single biggest determinant of whether a listener follows you. A well-paced speaker with a disorganized message loses the audience faster than a slightly-clumsy speaker with a tight structure. Structure earns 250 of the 1000 points because the audience reaction matches the weight.

  • Are the scores reproducible?

    Yes within deterministic dimensions (pronunciation, filler rate, pace) and bounded within LLM dimensions. Prompts and rubrics are versioned and stored alongside the code, so any Wellspoken Index value can be traced to the prompt version, model, and input audio that produced it.

Related reading