How is the Wellspoken Index calculated?

Each recording is scored across six dimensions on a 1000-point scale. Structure (250 points) and Conciseness (200 points) use LLM evaluation of the transcript. Confidence (150 points) uses a multimodal LLM with audio when available. Pronunciation (150 points) uses Azure's deterministic word-level speech scoring. Filler Rate (150 points) and Pace (100 points) use deterministic formulas with LLM-generated feedback.

Why is Structure weighted the highest?

Clear organization is the single biggest determinant of whether a listener follows you. A well-paced speaker with a disorganized message loses the audience faster than a slightly-clumsy speaker with a tight structure. Structure earns 250 of the 1000 points because the audience reaction matches the weight.

Are the scores reproducible?

Yes within deterministic dimensions (pronunciation, filler rate, pace) and bounded within LLM dimensions. Prompts and rubrics are versioned and stored alongside the code, so any Wellspoken Index value can be traced to the prompt version, model, and input audio that produced it.

How the Wellspoken Index is calculated

The Wellspoken Index is a 1000-point measurement of spoken communication. Each response a person records moves through a multi-stage pipeline that combines deterministic speech analysis with LLM evaluation. This page is the canonical reference for how the score is computed.

The six dimensions

Every recording is scored across six dimensions. The weights are fixed and reflect what experienced communication coaches actually weight when they evaluate a speaker. Structure carries the most weight because clear organization is the single biggest determinant of whether a listener follows you.

Structure

250 pts · LLM evaluation (transcript)

Sub-metric	Max	What it measures
Logical Sequence	50	Did ideas build in a sensible order
Transitions	50	Are ideas connected to each other
Signposting	50	Markers like 'first,' 'the main point is'
Opening Quality	50	Did the response start with a clear point
Closing Quality	50	Did the response land on a conclusion

Conciseness
200 pts · LLM evaluation (transcript)
Sub-metric Max What it measures
Word Choice 100 Using the right word for the moment
Word Economy 100 Saying it in as few words as needed

Sub-metric	Max	What it measures
Word Choice	100	Using the right word for the moment
Word Economy	100	Saying it in as few words as needed

Confidence

150 pts · Multimodal LLM (audio + text when available)

Sub-metric	Max	What it measures
Hedging Frequency	50	Use of 'I think,' 'sort of,' 'maybe,' 'I guess'
Uptalk	50	Statements ending with rising intonation like questions
Assertiveness	50	Ratio of direct statements to qualified ones

Pronunciation
150 pts · Azure deterministic scoring + LLM feedback
Sub-metric Max What it measures
Pronunciation Clarity 150 Overall clarity and correctness, scaled from Azure's word-level accuracy
Filler Rate
150 pts · Deterministic formula + LLM feedback
Sub-metric Max What it measures
Filler Frequency 150 Filler words (um, uh, like, you know) per minute, exponentially decayed
Pace
100 pts · Deterministic formula + LLM feedback
Sub-metric Max What it measures
Words Per Minute 50 Speaking rate relative to a professional target
Pause Timing 50 Whether pauses break thought or add weight

Sub-metric	Max	What it measures
Pronunciation Clarity	150	Overall clarity and correctness, scaled from Azure's word-level accuracy

Sub-metric	Max	What it measures
Filler Frequency	150	Filler words (um, uh, like, you know) per minute, exponentially decayed

Sub-metric	Max	What it measures
Words Per Minute	50	Speaking rate relative to a professional target
Pause Timing	50	Whether pauses break thought or add weight

Why this design

Different dimensions need different tools. Pronunciation accuracy is a well-defined acoustic problem with mature speech APIs that can measure it objectively. Structure and conciseness are judgment calls about meaning, which require an LLM with strong reading comprehension. The Wellspoken Index assigns each dimension to the tool that does the job best.

Deterministic where possible. Pronunciation, filler rate, and pace use math on measurable signals. The numbers are repeatable across runs and reviewers.
LLM where it must be. Structure and conciseness need a model that can understand the content. Each LLM call follows a versioned rubric so the scoring stays consistent over time.
Multimodal for delivery. Confidence uses audio when available so the score reflects tone and intonation, not just word choice. Text-only fallback is documented and bounded.

Pipeline

Each recording passes through three cloud providers depending on the dimension:

Azure Speech SDK performs word-level pronunciation scoring and returns phoneme accuracy.
Google Vertex AI / Gemini runs multimodal evaluation, with the audio track sent inline for the Confidence dimension.
Google Cloud Storage holds source media when files exceed the inline transport limit.

Seven LLM calls run in parallel where possible. The system normalizes raw outputs into the dimension scores and aggregates them into the final 1000-point Index.

Reproducibility and fallbacks

Prompts are versioned. Rubrics are stored alongside the code. When a primary model fails or returns an unparseable response, the system falls through a documented fallback chain rather than silently producing a degraded score. Every Wellspoken Index value is traceable to the prompt version, model, and input audio that produced it.

Citation

If you reference the Wellspoken Index in research or in journalism, cite it as "the Wellspoken Index, a 1000-point scoring system for spoken communication developed by Wellspoken Labs Inc." and link tothis methodology page.

Press inquiries: press@wellspoken.me.

FAQs

How is the Wellspoken Index calculated?
Each recording is scored across six dimensions on a 1000-point scale. Structure (250 points) and Conciseness (200 points) use LLM evaluation of the transcript. Confidence (150 points) uses a multimodal LLM with audio when available. Pronunciation (150 points) uses Azure's deterministic word-level speech scoring. Filler Rate (150 points) and Pace (100 points) use deterministic formulas with LLM-generated feedback.
Why is Structure weighted the highest?
Clear organization is the single biggest determinant of whether a listener follows you. A well-paced speaker with a disorganized message loses the audience faster than a slightly-clumsy speaker with a tight structure. Structure earns 250 of the 1000 points because the audience reaction matches the weight.
Are the scores reproducible?
Yes within deterministic dimensions (pronunciation, filler rate, pace) and bounded within LLM dimensions. Prompts and rubrics are versioned and stored alongside the code, so any Wellspoken Index value can be traced to the prompt version, model, and input audio that produced it.