The Stanford number, in one paragraph

In 2016, a team at Stanford (Sherry Ruan, Jacob Wobbrock, Kenny Liou, Andrew Ng and James Landay) ran a controlled experiment comparing speech recognition with the iOS keyboard for short-message text entry. They tested in English and Mandarin. The English result: speech was 3.0x faster than typing, with 20.4% fewer errors. The Mandarin result was 2.8x faster with 63.4% fewer errors. (Paper on arXiv, and the Stanford write-up.)

That's where the "3x faster" headline comes from. Roughly: average smartphone typing on a Qwerty keyboard runs around 38–50 words per minute. Decent speech recognition runs around 150 words per minute. The math checks out.

~40 wpm Average mobile typing
~150 wpm Conversational speech
3.0x Speech vs typing, English (Ruan et al. 2016)

The catch nobody mentions

The Stanford study measured something specific: entry rate. The time between starting a message and the final correct text being on screen. That includes corrections. So far, fair comparison.

What it does not include is the thing that happens after you press send in real life: nothing. People who hit send on a text message rarely re-read it. Same for a Slack DM, a quick comment, a WhatsApp.

For that kind of writing, the 3x number is your real-world number. Speech wins outright.

But the moment you're dictating something you're actually going to send to a client, a hiring manager, your boss, or your CRM, you re-read it. You always re-read it. And that re-reading takes time the Stanford study didn't measure, because the study wasn't measuring proofreading-grade output.

What proofreading actually costs

A rough estimate: most people read silently at around 250–300 words per minute. Add a pass for "does this say what I meant?", another quick pass for typos and homophones the model got wrong, and you're at maybe 30–60 seconds of read-back per minute of dictated content.

So the honest math for a one-minute dictation in the real world:

  • Speaking it: 60 seconds.
  • Re-reading and fixing it: 30–60 seconds.
  • Total: 90–120 seconds for ~150 words.

That's 75–100 words per minute end-to-end, not 150. Still roughly 2x faster than typing 40 wpm on mobile, or about even with a fast laptop typist hitting 70–80 wpm.

So: not 3x in practice. Closer to 1.5–2x for anything you're going to be careful about. Still a win, but a different-sized win than the headline.

Where the speed gap is biggest

The wins compound when typing is unusually painful. A few specific cases where dictation pulls ahead of even fast keyboard users:

Long-form messages you don't want to draft twice

A 200-word client email that you've been rehearsing in your head: dictate it once, fix two words, send. Total time: maybe 90 seconds. Same email typed: 3 minutes plus the rehearsal in your head you already paid for.

Mobile

This is what the Stanford study tested, and it's the case where dictation crushes typing hardest. Tiny touchscreen keyboard, autocorrect making weird guesses, you're standing on a train. Nothing about that is going to outrun your mouth.

Anything in a language where typing is hard

Dutch users typing on a US-layout keyboard. Anyone dealing with characters that need modifier keys (umlauts, accents, em-dashes). Speech sidesteps the keyboard's limitations entirely. You say "café" and the model writes "café", not "cafe" you-fix-later.

Dictation into AI chatbots

This isn't about speed at all, it's about quality — and we wrote about it in a separate post. Short version: people give much richer prompts to ChatGPT when they speak them, which produces much better answers. The 3x speed is a side benefit.

Where it doesn't help

Coding. Spreadsheet formulas. Editing existing text. Any case where you're navigating a structure, not producing prose. Speech is for generating new text. The keyboard is still the right tool for everything else.

Also: open-plan offices where you can't talk freely, libraries, cafes during the silent hour. Dictation needs at least normal speaking volume to work well, and you'll be self-conscious about it for the first week regardless.

The honest sales pitch

For dictation tools (including the one we make), the right pitch isn't "3x faster than typing" — that's overpromising. The right pitch is:

  • Faster on mobile. Always.
  • Faster on desktop for anything longer than two sentences, after you account for re-reading.
  • Less painful, even when the speed advantage is small. Typing for 2 hours straight is exhausting in a way 2 hours of talking isn't.
  • Different output — more natural, more like how you'd explain something to a colleague rather than how you'd write it down.

The first three are about speed. The fourth one is probably the most important and the least talked about. We'll come back to it.