Google is funding “an artificial intelligence for data science”

Google is funding a project called Automatic Statistician that bills itself as “an artificial intelligence for data science,” it announced Tuesday. The project, which comes out of the University of Cambridge and is still in its early stages, aims to automate the selection, building and explanation of machine learning models.

In a nutshell, Automatic Statistician works by looking at a dataset and then determining which type of model would be best for analyzing it as well as which features, or variables, are the strongest. After the model runs, Automatic Statistician will return a text report explaining its findings in plain English — or as close as you can get when dealing with statistics.

A snippet of an Automatic Statistician report on unemployment data.

A snippet of an Automatic Statistician report on unemployment data.

The project’s homepage quotes Google research scientist Kevin Murphy, who also wrote the blog post announcing Google’s funding for it, explaining the promise of Automatic Statistician like this:

[blockquote person=”” attribution=””]The first problem is that current Machine Learning (ML) methods still require considerable human expertise in devising appropriate features and models. The second problem is that the output of current methods, while accurate, is often hard to understand, which makes it hard to trust. The “automatic statistician” project from Cambridge aims to address both problems, by using Bayesian model selection strategies to automatically choose good models / features, and to interpret the resulting fit in easy-to-understand ways, in terms of human readable, automatically generated reports.[/blockquote]

However, Automatic Statistician isn’t the first attempt to deliver this type of service; there have, in fact, been multiple commercial attempts at doing similar things. The most accurate comparison might be to a now-defunct tool by machine learning startup Skytree called Skytree Adviser, which also automatically selected models and generated text reports of its findings. Startups including BeyondCore, Nutonian and even Ayasdi are all promising varying degrees of this functionality, as well.

As sexy as it is to talk about automating the data scientist job, though, it’s a bit early to suggest any software will eliminate the need for such employees any time soon. Even if projects like Automatic Statistician or commercial tools can make it possible for relative laypersons to run machine learning models and uncover patterns, that’s just a step or two down what’s often a much-longer path of turning insights into real value or, possibly, products.