In the business world, the voice is a powerful thing. In meeting rooms, offices and conference calls, it’s how ideas are generated, mandates given and gauntlets thrown down. Yet, somehow, the record of all these discussions doesn’t quite do them justice: messy handwritten (and probably incomplete) notes, typed meeting minutes that don’t distinguish idle chatter from meaningful business or, worse, no record at all. Thanks to advances taking place in computing and machine learning, that’s all about change.
Take, for example, a startup called Gridspace that wants to make meetings more productive by outsourcing note-taking to a machine. It’s a challenging problem to solve — any solution must provide a seamless experience, as well as be accurate — but the company is trying to do it right. It has built product that bundles smart hardware and applications with several flavors of speech recognition, voice recognition and natural language processing.
The most noticeable piece of the puzzle is the hardware — a simple, small recording device called the Memo M1 that sits on a desk or table. It’s always on, although its ambient light and motion sensors let it kick in only when someone is actually in the room. It has radio sensors to help determine who’s in the room based on their mobile phone fingerprints, although voice recognition helps makes this more accurate as does pre-planning the meeting using the Memo app and listing the participants.
The Memo service works with conference lines, as well (it can be set up to automatically call participants) and there’s a mobile app available for recording conversations on the road.
After a meeting is done, Memo will email everyone the highlights of the meeting and provide them an opportunity to go through and comment on or flag certain parts. The next day they’ll receive a fuller digest, complete with that post-facto information. At any time, participants can listen to the highlights of the meeting, which presumably are important points or action items, or they can hear the whole thing. They can search for specific parts by word or person.
Gridspace CTO Anthony Scodary described the user experience design as being focused on minimizing changes to how we go about our days in the office. Set up to its fullest potential, Memo users don’t have to press a button, set up something in an app, or even speak a command at something to take advantage of the service. “It’s really just [about] designing interfaces … that make something that you don’t have to change your natural behaviors much,” he said.
Getting it right means getting NLP right
As seamless as the experience might be, though, it’s Gridspace’s work on natural language processing and speech recognition that could make or break the company. All the automation and search capabilities in the world don’t mean much if a system designed to capture meetings can’t understand what’s happening or what’s being said. And after all, as Scodary acknowledged, “The end goal [of Memo] is to generate what is essentially the highlight reel of a meeting.”
Memo has several methods for deeming what might be important, ranging from certain keywords being spoken (e.g., “This is important.”) to someone manually pressing a button on the M1 device to flag it as important. Even changes in volume or lots of people talking over each other might indicate a key part of the conversation.
However, as with many machine learning systems today, it’s the input of humans that will help train Memo to be as accurate as it can be, Scodary explained. The more that people go through afterward and verify the system was correct, or flag important parts it missed, the smarter it gets. When someone “inputs unambiguously that something is important,” he said, Memo analyzes the context around those sections and readjust the weights in its algorithms accordingly.
Out of the boardroom and into the hallway
If Gridspace, which is still in the process of closed pilot projects and taking reservations for its M1 devices and mobile app, can pull this off, it could have promise even beyond the conference room. Scodary envisions a future where people have Memo devices sitting on their desks, ready to capture an impromptu brainstorming session or maybe just a short chat about the all-hands meeting earlier in the day.
“We’re very interested in those three-minute meeting between your other meetings,” Scodary said. (And don’t worry: there’s a mute button if you’re going to complain about the boss, and Scodary said the company is working on features for voice commands to strike previous comments and to delete parts of a meeting that has already happened.)
Frankly, this vision is the kind of thing one can see a company like Microsoft or Google chasing, too, as they strive to own productivity by owning the crossroads of collaboration, communication and devices. This type of technology could find its way into an already sensor-packed smartphone, tablet, desktop or even wearable — Intel recently showed off a new mobile processor designed with voice recognition in mind — and integrate with existing office suites and meeting applications.
Their teams of artificial intelligence researchers — who have already made speech recognition commonplace on smartphones and gaming systems, and who are advancing the the state of the art in language understanding — could help make such a system faster, more accurate and even predictive.
At home or in the office, our voices could soon be just as important inputs to our computers as our keystrokes. Once we figure out how to avoid putting our collective foot in our mouth, we’ll probably be thankful for it.