Google has open sourced a tool for inferring cause from correlations

Google announced on Tuesday a new open source tool that can help data analysts decide if changes to products or policies resulted in measurable change, or if the change would have happened anyway. The tool, called CausalImpact, is a package for the R statistical computing software, and Google details it in a blog post.

According to blog post author Kay H. Brodersen, [company]Google[/company] uses the tool — created it, in fact — primarily for quantifying the effectiveness of AdWords campaigns. However, he noted, the same method could be used to gauge everything from whether adding a new feature caused an increase in app downloads to questions involving events in medical, social or political science.

Here’s how Brodersen describes CausalImpact in the blog, first at a high level and then in some more detail. The blog post has a deeper explanation of the package, as well as instructions for installing it via GitHub:

“In practice, estimating a causal effect accurately is hard, especially when a randomised experiment is not available. One approach we’ve been developing at Google is based on Bayesian structural time-series models. We use these models to construct a synthetic control — what would have happened to our outcome metric in the absence of the intervention. This approach makes it possible to estimate the causal effect that can be attributed to the intervention, as well as its evolution over time. …

“The CausalImpact R package implements a Bayesian approach to estimating the causal effect of a designed intervention on a time series. Given a response time series (e.g., clicks) and a set of control time series (e.g., clicks in non-affected markets, clicks on other sites, or Google Trends data), the package constructs a Bayesian structural time-series model with a built-in spike-and-slab prior for automatic variable selection. This model is then used to predict the counterfactual, i.e., how the response metric would have evolved after the intervention if the intervention had not occurred.”


Credit: Google

The differences between causation and correlation — and the importance of not conflating the two just because you’re now dealing with big data — has been explained ad nauseam. And although all of those concerns hold true, especially if we’re using data to solve a problem or to inform policy strategies that could have meaningful negative effects on individuals, this type of tool is still potentially very useful. Strong causal inference could serve as a jumping-off point for a deeper study of cause and effect, and for applications such as advertising, marketing or site/app design it might be good enough.

At any rate, as companies like Google keep touting the importance of data-driven decision-making, it’s good to see them help out the cause by releasing some tools that will make it easier for folks to do just that.