A startup wants to quantify video content using computer vision

Computer vision has seen some major advances over the past couple of years, and a New York-based startup called Dextro wants to take the field to a new level by making it easier to quantify what the computers are seeing. Founded in 2012 by a pair of Ivy League graduates, the company is building an object-recognition platform that it says excels on busy images and lets users query their videos using an API a la other unstructured datasets.

The idea behind Dextro, according to co-founder David Luan, is to evolve computer vision services beyond tagging and into something more useful. He characterizes the difference between Dextro and most other computer vision startups (MetaMind, AlchemyAPI and Clarifai, for example) in terms of categorization versus statistics. Tagging photos automatically is great for image search and bringing order to stockpiles of unlabeled pictures, “but we found that most of the value and most of the interest … is when people know what they’re trying to get out of it,” he said.

Dextro has created an API that lets users query their images and, now, videos for specific categories of objects and receive results as JSON records. This way, they can analyze, visualize or otherwise use that data just like they might do with records containing usage metrics for websites or mobile apps. People might want to ask, for example, how many of their images contain certain objects, at what time within a video certain objects tend to appear, or what themes are the most present among their content libraries.

“You have a question about your data,” he said, “let’s help you answer it.”

I used Dextro's video demo to search a YouTube video (about installing a toilet) for toilets, beds and pistols.

I used Dextro’s video demo to search a YouTube video (about installing a toilet) for toilets, beds and pistols.

Aside from the ability to query image and video data, Dextro is trying to differentiate itself by training its vision models to detect objects and themes within chaotic scenes (not nicely focused, single-subject, or what Luan calls “iconic,” shots) and by analyzing videos as they are. “There’s so much information about your video that you lose by chopping it up into frames,” Luan said.

Turns out there really is a bed in it, too.

Turns out there really is a bed in it, too.

He’s quick to note is that although Dextro uses deep learning as part of its secret sauce, it’s not a deep learning company.

In fact, focusing on a narrow set of technologies or use cases is just the opposite of what he and co-founder Sanchit Arora hope the company will become. Luan already tried that in 2011 when he left Yale, accepted a Thiel Fellowship (he completed his bachelor’s degree at Yale in 2013), and took a first stab at the company as a computer vision and manipulation platform for robots. The name Dextro is a play on “dextrous manipulation.”

Although he and Arora both have lots of experience in robotics, Luan said the present incarnation of Dextro –which has raised $1.56 million in seed funding from a group of investors that includes Yale, Two Sigma Ventures and KBS+ Ventures — aims to be a general-purpose platform. Robots could eventually be a great form factor for the type of platform the company is building, but that market isn’t big enough just yet and there’s so much video being generated elsewhere.

David Luan (second from left) speaking at a Yale event.

David Luan (second from left) speaking at a Yale event.

And like most machine learning systems, the more that Dextro’s system sees, the smarter it gets. Luan thinks computer vision platforms will ultimately be a winner-take-all space, with the company analyzing the most and best content having the most-accurate models. “We want to power all the cameras and visual datasets out there,” he said.

That’s a lofty, and perhaps unrealistic, goal, but it’s indicative of the excitement surrounding the fields that companies like Dextro are playing in. One of the themes of our upcoming Structure Data conference is the convergence of artificial intelligence, robotics, analytics, and business that’s happening right now and changing how people think about their data. As computers get better at reading and analyzing data such as pictures, video and text, the onus falls on innovative users to figure out how take advantage of it.