Here’s more evidence that sports is a goldmine for machine learning

If you really like sports and you’re really skilled at data analysis or machine learning, you might want to make that your profession.

On Thursday, private equity firm Vista announced it has acquired a natural-language processing startup called Automated Insights and will make it a subsidiary of STATS, a sports data company that Vista also owns. It’s just the latest example of how much money there is to be made when you combine sports, data and algorithms.

The most-popular story about Automated Insights is that its machine-learning algorithms are behind the Associated Press’s remarkably successful automated corporate-earnings stories, but there’s much more to the business than that. The company claims its algorithms have a place in all sorts of areas where users might want to interact with information in natural language — fitness apps, health care, business intelligence and, of course, sports.

In fact, someone from Automated Insights recently told me that fantasy sports is a potential cash cow for the company. Because its algorithms can analyze data and the outcomes of individual matchups, it can deliver everything from in-game trash-talk to post-game summaries. The better the algorithms are at mimicking natural language (i.e., not just regurgitating stats with some static nouns and verbs around them), the more engaging the user experience — and the more money the fantasy sports platform, and Automated Insights as a partner, make. Automated Insights already provides some of this experience for Yahoo Sports.


So it’s not surprising that STATS would acquire Automated Insights. STATS provides a lot of data products to broadcasters and and folks selling mobile and web applications, ranging from analysis to graphics to its SportVU player-tracking system. At our Structure Data conference next month in New York, STATS Executive Vice President of Pro Analytics Bill Squadron will be on stage along with ESPN’s vice president of data platforms, Krish Dasgupta, to discuss how the two companies are working together the sate an ever-growing sports-fan thirst for data. (We’ll also have experts in machine learning and deep learning from places such as Facebook, Yahoo and Spotify discussing the state of the are in building machines that understand language, images and even music.)

And Automated Insights isn’t even STATS’s first acquisition this week. On Tuesday, the company announced it had acquired The Sports Network, a sports news and data provider. In September, STATS acquired Bloomberg Sports.

More broadly, though, the intersection of sports and data is becoming a big space with the potential to be huge. Every year around this time, people in the United States start going crazy over the NCAA collegiate men’s basketball tournament (aka March Madness) and spend billions of dollars betting on it in office pools and at sports books. And every year for the past several, we have been seeing more and more predictive models and other tools for helping people predict who’ll win and lose each game.


Statistician superstar Nate Silver might be best known for his ability to predict elections, but he has been applying his trade to sports including baseball and the NCAA tournament for years, too. It’s no wonder ESPN bought him and his FiveThirtyEight blog and turned it into a full-on news outlet that includes a heavy emphasis on sports data.

The National Football League might present the biggest opportunity to cash in on sports data. Aside from the ability to predict games and player performance (gambling on the NFL — including fantasy football — is a huge business), we now see individuals making their livings with football-analysis blogs that turn into consulting gigs. There’s a growing movement to tackle the challenge of predicting play calling by applying machine learning algorithms to in-game data.

Even media companies are getting into the act. The New York Times dedicates resources to analyzing every fourth down in every NFL game and telling the world whether the coach should have punted, kicked a field goal or gone for it. In 2013, Yahoo bought a startup called SkyPhrase (although it folded the personnel into Yahoo Labs) that developed a way to deliver statistics in response to natural language queries. The NFL was one of its first test cases.

A breakdown of what happens on fourth down.

A breakdown of what happens on fourth down.

Injuries are also a big deal, and there is no shortage of thought, or financial investment, into new ways of analyzing measuring what’s happening with players’ bodies so teams can better diagnose and prevent injuries. Sensors and cameras located near the field or even on players’ uniforms, combined with new data analysis methods, provide a great opportunity for unlocking some real insights into player safety.

All of this probably only skims the surface of what’s being done with sports data today and what companies, teams and researchers are working for tomorrow. So while analyzing sports data might not save the world, it might make you rich. If you’re into that sort of thing.

A model that can predict the unpredictable New England Patriots

It’s said that familiarity breeds contempt in personal relationships. In the NFL, it might also breed predictability. Although the New England Patriots and their coach Bill Belichick are often called unpredictable, it turns out that machine learning models are actually pretty good at guessing what they’ll do.

Alex Tellez, who works for machine learning startup H2O, built a model he says can predict with about 75 percent accuracy whether the Patriots will run the ball or pass it on any given play. He used 13 years of data — all available on — that includes 194 games and 14,547 plays. He considered a dozen variables for each play, including things such as time, score and opposing team.

Tellez thinks it might be possible to build a model that predicts plays with even more accuracy. He noted, while slyly touting his company’s software, that this one was created with just a few clicks using the H2O platform. Spending more time and tweaking some of the features might improve accuracy, and he suggested that feeding data into a recurrent neural network (which would have some ability to remember some results from one play to the next) might help account for the emergence of players like running back Legarrette Blount, who can skew play-calling in the short term.


Tellez’s model works, in part, because of how long Belichick and quarterback Tom Brady have been together — 15 seasons now. That’s a lot of time to amass data about what types of plays the team will call in any given situation, with at least two constant — and important — variables in the coach and the quarterback.

“Realistically, Bill Belichick and Tom Brady, those are like the only dynamic duo,” explained Tellez. “You couldn’t do it with the Raiders,” he added, alluding to that team’s revolving door of coaches and quarterbacks.

Or even the New England Patriots’ Super Bowl competitor, the Seattle Seahawks, who are working with a fifth-year coach and third-year quarterback.

Last year, I wrote about Brian Burke, the founder of Advanced NFL Analytics and the guy whose models power the New York Times 4th Down Bot. “The number of variables, it explodes geometrically,” he said about the challenges of predicting football plays.


Still, even if predicting the likelihood of a run or a pass remains an unsolvable challenge for most of the NFL, the proliferation of data has already and likely will continue to change the face of football — and sports overall — in some very significant ways. Some obvious ones are the advanced metrics used by Major League Baseball teams to rate players beyond just their batting averages or earned-run averages, the now-trite “moneyball” method building rosters, and the remarkable success of expert statisticians such as Burke and FiveThirtyEight’s Nate Silver.

At our Structure Data conference in March, data executives from ESPN and real-time player-tracking specialist STATS will discuss how access to so much data is changing the fan experience, as well, and even the on-court decision-making in sports such as professional basketball.

Depending on whether anyone can build accurate-enough models, Tellez actually suggested we could see live sports broadcasts include predictions of the next play similar to how ESPN predicts outcomes in its World Series of Poker broadcasts. While his Patriots model took about 30 seconds to run, live-broadcast models would have the benefit of being able to pre-load data for the specific game situation and only run against that data, he said.

Richard Sherman

Richard Sherman

He also has another idea for applying advanced data analysis the NFL — predicting rookie performance in the NFL combine. That’s where draft prospects go to show off to NFL scouts how big, fast and strong they are. However, not all prospects participate in all the events, which can give teams an incomplete view of their athletic prowess.

Tellez built a special type of neural network, called a self-organizing map, to analyze all combine performance for cornerbacks, specifically, and then fill in the blanks when players opt to skip a particular exercise. Think about it like Google’s Auto-Fill feature, which predicts missing values in spreadsheets. He says he discovered that good 40-yard dash, shuttle run and 3-cone times tend to correlate with high draft picks and future success, so being able to predict those times even if a prospect doesn’t do them could be valuable.

Of course, Tellez noted, stats don’t always tell us the truth. His model, as well as NFL scouts, predicted Seattle Seahawks cornerback Richard Sherman as a mid-round draft pick. The Seahawks drafted him in the fifth round. He’s now considered one of the league’s most-dominant cornerbacks and most-recognizable players.

Wilson is promising a smart basketball that knows when you make a shot

Sporting goods company Wilson is working with a Finnish startup called SportIQ to create a basketball that uses sensors and artificial intelligence to determine how far the ball traveled and whether the shot was made. It’s not the first application of sensors and algorithms into sports gear — we already have them in football helmets, soccer balls and basketball nets, for example — but the Wilson basketball is pretty unique in that it seems to target individual consumers, meaning anyone with the ball, a hoop and a web connection can start quantifying their game.

An MLB team is apparently doing in-game graph analysis

A Major League Baseball team is reportedly the proud owner of a Cray Urika graph-processing appliance that helps the team make in-game decisions by analyzing lots and lots of data. It might be a first, but it’s where sports are headed.