We need computer sight to make smart homes smart

Today home automation works via remote controls or smartphones, but most people agree that there are intermediate steps before we get to voice or gesture controlled homes. Within the decade every smart home should have an always-on device sending information about who is in a room and where they are in relation to other people and objects in the room.

After visiting the frog offices in Austin, Texas on Thursday to chat with Mark Rolston, frog’s chief creative officer ahead of our chat next week at our Roadmap conference Nov. 6, I was convinced that always-on computer vision will be an essential element to get us to a different style of computing. Computer vision is just a tool — the actual application will be the ability to interact with computers in a variety of places in your home where and when it makes sense for you.


For example, you might be standing in your kitchen and deciding to order take out. You say, “Let’s order take out,” and on the kitchen counter a list of local Yelp recommendations will appear. Your spouse might see the list and want to order Chinese, so the list branches out to show only Chinese options. You might want Italian, so you can ask for that. Then you two can confer or invite the kids to take a vote. If you move to the living room to ask the kids, the projection will appear on the TV or the coffee table.

From there you can say, “Let’s order Romeo’s” and see a menu or call up your last order and just get that. In this scenario no one pulled out a smartphone. No one opened a laptop or pulled out a tablet. And the components to make this possible are mostly here today.

Those aren't Halloween masks, it's how the computer sees the frog designers' faces to see if they are looking at it.

Those aren’t Halloween masks, it’s how the computer sees the frog designers’ faces to see if they are looking at it.

Computer sight is a necessary ingredient

That’s what I went to the frog offices to see; how to build such a vision. The frog team calls its set up Room-e and it’s basically a Microsoft(s msft) Kinect and a projector with a computer running some amazing software. The Kinect provides both the video camera but also an array mic that also contributes to the computer’s “sight.” Using the set up I could point to a light and it would turn on. That’s much nicer than using an app or getting up to turn a switch. (We’ll also have the designer of the Xbox One at Roadmap, Microsoft’s Carl Ledbetter).

Computer vision isn’t like human vision; a computer can “see” using disruptions in sound waves, extrapolating via footfalls or even disruptions in wireless networks. So while an always-on camera might discomfit people, it doesn’t necessarily need to be a video feed of what’s happening in the home. Early versions of such implementations, however, probably will be.

frogdesign_frog_roome_previewFrog’s current version uses the original Kinect camera for the Xbox 360, but higher resolution on the newer Xbox One Kinect will allow them to build software that recognizes more subtle gestures and facial expressions so interacting with your lights won’t require an Emril-like level of enthusiasm.

Rolston showed me a model of the cameras and/or projectors built into a light bulb that might turn out to be a way this is implemented. Another option is a mic and a camera embedded into a light switch (pictured left).

What about voice?

Of course, vision isn’t the only user interface. Voice will play a crucial role in how we interact with the smart home, from calling for takeout to saying “Goodbye” and turning off the lights. The key will be having both, so you can interact naturally with your environment without feeling like Jean-Luc Picard giving commands to the USS Enterprise.

Voice is further along in terms of the hardware — check out the recently announced thermostat from Honeywell that you can control via voice (Anthony P. Uttley, VP and GM at Honeywell will be showing it off at Roadmap during our conversation) or this demonstration of real-time translation from Microsoft. However, computer scientists still need to work on helping computers understand not just the words themselves, but the context of those words. That’s where deep learning research comes in.

In the meantime, Rolston says having a human on the other end of these commands might bridge the gap. He compares it to the OnStar service(s gm) where both computers and people help with translation and providing the service.

Projection and putting it all together


The key element of the Room-e vision is that the computing doesn’t happen on computers and that it can follow a user based on their orientation and needs. While the frog guys glossed over the software and technical difficulties of actually building out apps and services that can take advantage of a multi-device experience, they think projectors are the way to escape the tyranny of the screen.

Rolston points out that projecting images allows people to use computing in different ways, like putting a recipe for a meal on the countertop and letting someone page through it there, where his or her messy hands won’t get a screen dirty. An essential element of this strategy is that people can use their existing stuff in their home, as opposed to a specialty table. The image I saw (above) was projected on a white table and frog hopes to develop the technology to work on wood and other surfaces.

To me, the demonstration was real enough that I went home frustrated with my current apps and automation, wishing for a computing experience that was less about the computer or even a specific device. While there will be privacy concerns associated with computer vision, the application of this technology in the demo was such that I felt that same frisson of “magic” that Apple(s aapl) has reliably produced with its products.

Maybe Cupertino should focus on commercializing computer vision and projectors for the masses.

RoadMap 2013