The promise of a truly voice activated world: How real is it

While the world is slowly being controlled by voice activation and speech recognition, it usually requires a physical trigger or button press, which seems to undercut the power of voice control. But that distinction should be fading away as we get more sophisticated technology that allows devices to recognize more voice commands without a physical prompt.

That’s the promise behind the release of Sensory’s TrulyHandsfree Voice Control 2.0 software Tuesday. The latest speech detection technology builds upon the 1.0 version, which allowed users to trigger a device with a small set of words without a prompt. Now version 2.0 recognizes and responds to many more keywords and longer phrases, dozens of them, that can be analyzed in the course of a regular noisy conversation. The latest update increases the accuracy significantly by 60 percent and picks out words better in loud environments. What it enables is a much more natural interface with a device that can be listening for a wide array of triggers.

TrulyHandsfree is already at work in car Bluetooth kits from BlueAnt and Kensington and phones like the Samsung Galaxy S II. And it’s finding its way into sophisticated toys like Mattel’s (s mat) Fijit Friends, which can respond to words spoken to it.

A voice-activated world

But that’s just the start. Sensory’s CEO Todd Mozer said he’s been contacted by four TV companies in the last month who are interested in using TrulyHandsfree as a front end for voice controls. He said there could be any number of devices from coffee machines and stereos to home automation systems and automobiles that can utilize the technology. It’s already starting to happen slowly, but now with more sophisticated and accurate software, we should see this integrated into a lot more products.

“Almost any new product can we use this Truly Handsfree approach, it’s so accurate,” Mozer said.

Removing buttons

Mozer said Truly Handsfree is transforming Sensory’s speech technology business. Since launching a two years ago, the product quickly got integrated into 1/3 of all Sensory’s voice-enabled products. This year, 2/3 of all products using Sensory’s technology are utilizing Truly Handsfree.

When combined with smarter speech recognition technology used by companies like Nuance (s nuan), Vlingo, Google (s goog) and Microsoft (s msft), it could really help bring speech technology into the mainstream. We’re seeing some of that promise now with devices like the Xbox Kinect, which can respond to some voice commands. But with a front end that can respond to a wide array of triggers and work in a lot of different environments, we could see users start to feel comfortable with speech as an input.

Mainstream catalyst

That’s one of things that still holds me up at times. I like Android Voice Actions, but even just the act of pressing the microphone icon to start an action is an extra step that sometimes deters me. Google recently said 25 percent of searches on Android devices are conducted by voice. Imagine how much that would go up if people could just talk to their phones without waking it up. A solution has to be accurate though, not firing on the wrong words. Mozer said Truly Handsfree 2.0 registers a 10 percent false reject rate and has a false acceptance rate of a word every three to five hours, and it can register words from 20 feet away. It’s not perfect, but it is getting very good.

I think speech technology has a ways to go to the point where we’re talking to computers like HAL or the system on the Star Trek Enterprise. But it’s coming together. We still need computers to figure out the harder task of understanding our speech and our intent but that’s coming around as we’re seeing with projects like IBM’s Watson (s ibm). We increasingly have so much data stored in the cloud and on our devices that it makes sense to also tap tools like voice to get at that. Achieving that future involves a lot of progress including the work being done by Sensory. I, for one, can’t wait to do less button pressing and more voice activating.