Pinterest, Yahoo, Dropbox and the (kind of) quiet content-as-data revolution

If you’re keeping count, Pinterest’s acquisition of Visual Graph on Monday is at least the seventh of its kind by popular web companies in the last 15 or so months. The “kind” I’m referring to are machine learning startups, specifically those focused on analyzing the content of images and text. Yahoo, Dropbox, Facebook and Google have previously made moves of their own, because the next wave in big data for web properties is the content their users are producing.

A few acquisitions stand out off the of my head, although I’m sure there are more that have flown underneath the radar:

  • Google acquired DNNresearch in March 2013. The creation of University of Toronto professor Geoff Hinton and his lab, the company focused on deep learning (the “DNN” stands for deep neural networks) to improve the state of the art in image recognition in photos.
  • Dropbox acquired Anchovi Labs in September 2012. The company was a very young startup at time, focused on software that let users easily train it on types of objects so it could recognize them as it analyzed new images.
  • Yahoo acquired, IQ Engines, LookFlow and SkyPhrase between August and December in 2013. IQ Engines and LookFlow were focused on image recognition — primarily, it seems, helping user sort through their smartphone photos and online albums — while SkyPhrase focused on letting people search for data using natural language.
  • Facebook hired deep learning expert Yann Lecun from New York University to head up the company’s new artificial intelligence lab. Lecun and his NYU group (which includes Rob Fergus, who’s also headed to Facebook), specialized in deep learning for computer vision and image recognition.

And then, on Monday, Pinterest announced it has acquired Visual Graph, the purpose of which appeared to be building a network graph of images based on an understanding of the elements they contain. Visual Graph co-founders Kevin Jing and David Liu (both former Google computer vision engineers) wrote on the company’s website about the acquisition, “We are excited for the opportunity to combine machine vision with human vision and curation, and to build a visual discovery experience that is both aesthetically appealing and immensely useful for people everywhere.”

Pinterest declined my request to speak with either Jing or Liu about their work.

An example of Visual Graph's ability to recognize faces. Source: Visual Graph

An example of Visual Graph’s ability to recognize faces. It caught 20 of 25. Source: Visual Graph

Actually, with the exception of Google, the companies making these acquisition haven’t been too forthcoming about how they’ll use the software and the brains they now possess in order to improve their services. Even LeCun, who gave an interview to Wired after joining Facebook, didn’t have much to say other than there’s a lot to learn from the countless images, chats and other text-based interactions that hit Facebook’s servers every day.

However, they don’t have to do much talking because Google (and even Microsoft) is already doing it for them. Whether it’s via unsupervised deep learning or some other machine learning technique, content data, not just behavior data, is the new battleground for learning about users and improving products along the way.

Searching your Google+ photos by subject even though you didn’t tag them? Voice search on Xbox? Better word prediction while text messaging? Better translation apps? You can thank machine learning.

And even though early applications of these techniques have been focused on new products or features, there’s an analytics aspect to all of this, as well. Just think about what a company could learn (or assume) about users by recognizing what’s happening in their photos and videos, or what their updates or posts are really about.

It’s pretty easy to see what companies like Facebook, Dropbox, Yahoo and Pinterest — with their untold petabytes worth of photos, blogs, articles, wall posts and instant messages — would want with advanced methods for analyzing the content that users generate, even if they won’t telegraph forthcoming products or features by talking too much about it.

Although, it’s worth noting, none of this is easy. Here’s a video from our Structure 2013 conference in which Google Fellow Jeff Dean talks about the capabilities of deep learning and the difficulty of building systems that can handle it.


The focus of our upcoming Structure Data conference, in fact, is very much about the same idea (and some of these techniques) that’s driving these acquisitions. New types of data, not just numbers, are becoming commodities that smart companies are using to build new types of products and capabilities, and gain new types of insights about their customers. We’ll have experts in fields ranging from automotive design to macroeconomics talking about how they’re doing better things by taking advantage of the latent data that surrounds them.

Maybe it all comes down to a new spin on an old maxim: A picture might be worth a thousand words, but they’re not worth anything if you never see most them because you have billions to sort through. The same goes for those trillions of words, for that matter. Better call in the machines for help.