GraphLab thinks its new software can democratize machine learning

GraphLab, a Seattle-based startup that launched in 2013 to develop an open source project of the same name, is releasing next week its first commercial software, called GraphLab Create. Unlike the open source software that is focused on graph analysis, Create is designed for data stored in graphs or tables, and can be used to easily run any number of popular machine learning tasks.
Carlos Guestrin, the company’s co-founder and CEO, said the goal of Create is to help savvy engineers or data scientists take their machine learning projects from idea to production. It includes a handful of modules for building certain types of popular workloads, including recommendation engines, graph analysis and clustering and regression algorithms.
We have previously covered some of the disillusionment with current machine learning libraries — many of which are open source — with regard to speed and ease of use. Even where those factors are improving, there has still been a dearth of out-of-the-box tools to create even popular machine learning applications such as recommendation engines.

Carlos Guestrin. Source: Carnegie Mellon University

Carlos Guestrin. Source: Carnegie Mellon University

One of Create’s main benefits is simplicity. Users write, test and deploy their jobs in Python, and the jobs execute in GraphLab’s C++ engine to step up the speed. Jobs can execute on a laptop or across a cluster of servers running Hadoop (with YARN), and built-in management tooling lets users monitor running jobs.
Given the current state of affairs, though, I asked Guestrin whether it’s really possible for a software product to democratize machine learning the way he hopes Create can do. “That’s a yet-to-be-answered question, because nobody has yet done it,” he said. So far, he acknowledged, most machine learning research has focused on one-off systems and “my curve is better than your curve” demonstrations. He thinks GraphLab Create can reach 80 to 90 percent of use cases because the focus from the beginning was on usability and robustness.
There are other commercial machine learning products on the market, including Skytree, but Guestrin said the big difference between them and GraphLab is in the barrier to actually using the product. GraphLab handles the full lifecycle from data engineering through production, and generally doesn’t require a specialized, deep engagement before it’s deployed.

An example of an SFrame tabular data structure in Create.

Guestrin said GraphLab already has a handful of big-name users for Create, including Pandora(s P) (where it’s running music recommendation algorithms) and Zillow(s Z). Some other well-known companies with significant movie and news recommendation algorithms have also expressed interest, he said.
Future versions of Create will include capabilities for even more advanced machine learning workloads, including deep learning and ensemble models, Guestrin said. Guestrin, who’s also a professor at the University of Washington, was one of the researchers who worked on a recent project nicknamed “Learn Everything About Anything,” in which a model taught itself different aspects of broad topics by analyzing data available online via Google(s GOOG) Books Ngrams and image databases such as Google Images and Flickr(s yhoo).
The way iOS(s aapl) did for mobile applications, Guestrin wants GraphLab Create to help make it so anybody with the inclination and a modicum of coding skills can start building machine learning applications without worrying about all the hairy details around optimization, deployment and other historically time-consuming processes. “I think we can do it,” he said, “and we’re hoping to be the ones to do it.”