Speech to Text, the Next Step

Windows has developed what they feel is the next generation of computing, the tablet PC. We’ve seen Bill Gates strongly advocate the tablet as the true “next generation” platform for computing. But I disagree, and here’s why.

Handwriting schmandwriting

I still argue that writing recognition on a tablet isn’t close enough to paper writing. I’ve used a HP Pavillion tx series tablet and what I’ve found is an intuitive hand writing recognition application. I think anything that can translate my handwriting into text is worth appreciating, but it still doesn’t replace the feel and use of paper. I think mainly because it still transforms my writing into text alone. Why not just type then, I’m faster at it anyways?

I know with the Wacom Tablet (PC/Mac) you can maintain your actual handwriting but again the experience takes time getting used to, and is still confined by the limitations of its own application. I know there are people who live by their Wacom tablet as if they were extensions of themselves, for lectures and daily navigating, but I think if we break down the need, we can create a more intuitive system of input.

What we’re looking for

  • Get the same feel and ability as writing with paper
  • Archive previous analog/written material digitally
  • Go paperless

That’s where the fundamental error is. We’re looking to replace paper when sometimes all we need is just paper. The problem with most companies trying to innovate is either they stick to mainstream ideas to ensure profit, or try to reinvent the wheel to no success.

If you want the feel and ability of paper, you use paper. If you want to archive your previous work, you scan and tag it. You want to go paperless, you utilize as many web resources as possible, whether its Google Docs, Remember the Milk or others, you switch from analog systems to digital ones.

Why change a good thing? There’s still many inherent values in using analog systems. Whether it’s note taking or task management, as GTD (getting things done) experts may feel, just getting things out of your head by writing them down just works. Sometimes there’s beauty in chaos.

What we should be looking for

  • Ability to quickly transcribe information
  • Ability to quickly sort, tag, or peruse vast archives
  • Quickly manipulate, edit material
  • Flexible data portability and sharing

Just as I type faster than I write, I speak faster than I type. And the only thing I can do faster than speak, is think. It would seem, naturally, the next computing platform would be text and gesture based. I see Apple developing both far more heavily than handwriting recognition.

Let’s run with the lecture analogy for a bit. Instead of taking notes by physically writing on either paper or tablet, why not sit back and relax? Have the lecture recorded while simultaneously transcribed to text for you to scan through later. Maybe one day it’ll recognize the speaker and auto tag it for the right lecture, or just tag keywords it recognizes. Now combine that with a simple gesture intuitive application for quick editing or search-ability. Add your own thoughts to whatever transcribed piece your going through and move on or simply re-listen to it completely.

Current development

Technology is getting far more adept to adaptive speech learning. Everything from voicemail to everyday gadgets are getting voice commands.

Did you know if you enable Speech in OS X you can ask it to tell you a joke? The speech bubble works well for basic tasks, opening an application, saving a document, etc. I imagine it one day being as useful and efficient as Quicksilver currently is. For a list of current commands check out the Speakable Items folder accessible from your Speeches Preference Pane.

Other speech to text resources:

  • Jott – Call a number and leave yourself a voice message. Jott will transcribe it and email it to you or a contact of your choice. They are working to add features like adding items from Amazon, etc.
  • Goog 411 – Free 411, get maps, directions and more.
  • Voice commands in cars – Navigation systems can come with some form of voice command ability now. I know in higher end cars, you can control most functions by voice alone (e.g. raise the temperature, lower the volume, etc.)
  • Voicemail transcriptions (similar to Jott, I know some companies are relying on services that record voicemail’s and transcribe them into text)
  • Doctor’s dictations – Avoid the hassle of hand writing everything and simply text to speech on a case by case basic.

Why this works well for portability

While the idea of voice can get us pretty far, there are still some places you can’t rely on voice to text. Your local library is probably the best example. If you rely on using input methods for notes or written material, and you don’t want to use paper, well I don’t really have any ideas. I’m assuming as we become more digital, we’ll be able to manage our books just as we manage our Wiki’s. Again taking away from the analog or physical mentality, and being able to archive vast amounts of material digitally.

Voice to text works great for text messages in the car, why not let that become more widespread. Computing for us (Gen Y) is exponentially easier. We type faster, we learn quicker, and we adapt fast. I see Apple implementing a series of voice and gesture capabilities across their lines. Of which we see the early onset within the iPhone and new MacBook Pro Touchpad.

It’s all in the voice

We can learn to compute even quicker with more intuitive systems that revolve around our most basic abilities. The things we learn from birth, the ways our bodies naturally function. We learn to walk, we learn to talk, and then as we become further educated we write. Paper works just fine, don’t create a solution to a problem we don’t have. Instead harness what we already know and do everyday, speak.