Here’s more evidence that sports is a goldmine for machine learning

If you really like sports and you’re really skilled at data analysis or machine learning, you might want to make that your profession.

On Thursday, private equity firm Vista announced it has acquired a natural-language processing startup called Automated Insights and will make it a subsidiary of STATS, a sports data company that Vista also owns. It’s just the latest example of how much money there is to be made when you combine sports, data and algorithms.

The most-popular story about Automated Insights is that its machine-learning algorithms are behind the Associated Press’s remarkably successful automated corporate-earnings stories, but there’s much more to the business than that. The company claims its algorithms have a place in all sorts of areas where users might want to interact with information in natural language — fitness apps, health care, business intelligence and, of course, sports.

In fact, someone from Automated Insights recently told me that fantasy sports is a potential cash cow for the company. Because its algorithms can analyze data and the outcomes of individual matchups, it can deliver everything from in-game trash-talk to post-game summaries. The better the algorithms are at mimicking natural language (i.e., not just regurgitating stats with some static nouns and verbs around them), the more engaging the user experience — and the more money the fantasy sports platform, and Automated Insights as a partner, make. Automated Insights already provides some of this experience for Yahoo Sports.


So it’s not surprising that STATS would acquire Automated Insights. STATS provides a lot of data products to broadcasters and and folks selling mobile and web applications, ranging from analysis to graphics to its SportVU player-tracking system. At our Structure Data conference next month in New York, STATS Executive Vice President of Pro Analytics Bill Squadron will be on stage along with ESPN’s vice president of data platforms, Krish Dasgupta, to discuss how the two companies are working together the sate an ever-growing sports-fan thirst for data. (We’ll also have experts in machine learning and deep learning from places such as Facebook, Yahoo and Spotify discussing the state of the are in building machines that understand language, images and even music.)

And Automated Insights isn’t even STATS’s first acquisition this week. On Tuesday, the company announced it had acquired The Sports Network, a sports news and data provider. In September, STATS acquired Bloomberg Sports.

More broadly, though, the intersection of sports and data is becoming a big space with the potential to be huge. Every year around this time, people in the United States start going crazy over the NCAA collegiate men’s basketball tournament (aka March Madness) and spend billions of dollars betting on it in office pools and at sports books. And every year for the past several, we have been seeing more and more predictive models and other tools for helping people predict who’ll win and lose each game.


Statistician superstar Nate Silver might be best known for his ability to predict elections, but he has been applying his trade to sports including baseball and the NCAA tournament for years, too. It’s no wonder ESPN bought him and his FiveThirtyEight blog and turned it into a full-on news outlet that includes a heavy emphasis on sports data.

The National Football League might present the biggest opportunity to cash in on sports data. Aside from the ability to predict games and player performance (gambling on the NFL — including fantasy football — is a huge business), we now see individuals making their livings with football-analysis blogs that turn into consulting gigs. There’s a growing movement to tackle the challenge of predicting play calling by applying machine learning algorithms to in-game data.

Even media companies are getting into the act. The New York Times dedicates resources to analyzing every fourth down in every NFL game and telling the world whether the coach should have punted, kicked a field goal or gone for it. In 2013, Yahoo bought a startup called SkyPhrase (although it folded the personnel into Yahoo Labs) that developed a way to deliver statistics in response to natural language queries. The NFL was one of its first test cases.

A breakdown of what happens on fourth down.

A breakdown of what happens on fourth down.

Injuries are also a big deal, and there is no shortage of thought, or financial investment, into new ways of analyzing measuring what’s happening with players’ bodies so teams can better diagnose and prevent injuries. Sensors and cameras located near the field or even on players’ uniforms, combined with new data analysis methods, provide a great opportunity for unlocking some real insights into player safety.

All of this probably only skims the surface of what’s being done with sports data today and what companies, teams and researchers are working for tomorrow. So while analyzing sports data might not save the world, it might make you rich. If you’re into that sort of thing.

InboxVudu uses NLP to help you focus on the emails that matter

A text-analysis startup called Parakweet (whose initial product focused on book recommendations) has launched a new application, called InboxVudu, that’s designed to help users reduce the stress of email by showing them just the messages that need their attention. And while it turns out that no amount of curation can really help ease the email burden of a technology journalist today, the app might work very well for other folks.

InboxVudu works by analyzing the text of a message and figuring out if the sender is asking something from the recipient — “I need an answer to that big question,” or “Please RSVP by Feb. 13” or something along those lines. At the end of the day, users receive an email from InboxVudu showing them the message that need their attention. From that email, as well as an associated web application, users can reply to emails, mark them as “resolved,” flag false positives and even mute the sender.

In Gmail, at least, the messages also find their way to an InboxVudu-labeled folder users can peruse them at their leisure.

A sample screenshot of an InboxVudu "digest."

A sample screenshot of an InboxVudu “digest.”

Parakweet co-founder and CEO Ramesh Haridas explained in an interview that the app works with about 90 percent accuracy today, based on internal testing, and that he hopes it will get even smarter as the company adds more signals into its models. For starters, there’s all the interaction data that users will generate by replying to messages and flagging false positives, which Parakweet can use to train the system both individually and at an aggregate level. Haridas also suggested the application might someday prioritize emails from people to whom users respond very quickly, or consider a sender’s job title or other measures of “global importance.”

I suggested InboxVudu could be really valuable as a way of helping users understand their “email graphs,” if you will, and Haridas agreed. He said the company is considering offering users’ statistics about their activity and which of their email contacts are the most important. However, he made sure to add with regard to privacy, “It’s all being processed by machines, so it’s never seen by a human being.”

Those feature and algorithmic improvements are still a way out though, but I’ve been using the first iteration of InboxVudu for about a week. And although it works as advertised — I’ve noticed few if any false positives — seeing 20 nicely bundled PR pitches doesn’t make it any easier to read them all or reply to them all. I’d consider muting the senders, but the nature of PR is that sometimes pitches are compelling and sometimes they’re not, even from the same person.


One really nice thing about InboxVudu, though, are the “follow-up” messages it displays — those where I’ve asked something from someone else and am still awaiting a response. I’m prone to being disorganized, so any reminders of the various stories or projects I’m working on, especially ones involving other people, are helpful.

If there’s one thing I’m confident about, though, it’s that programs like InboxVudu will continue to get better as the field of artificial intelligence continues to improve. As the speakers at our upcoming Structure Data (March 18-19 in New York) and Structure Intelligence (Sept. 22-23 in San Francisco) conferences demonstrate, advances in AI are happening fast, especially in fields such as language understanding and personal assistant technologies.

If Parakweet’s Kiam Choo, who studied under deep learning guru Geoff Hinton at the University of Toronto, can figure out a way to make to make email substantially less burdensome even for people like me with 29,000 unread messages, then the more power to him.

Now IBM is teaching Watson Japanese

IBM has struck a deal SoftBank Telecom Corporation to bring the IBM Watson artificial intelligence (or, as IBM calls it, cognitive computing) system to Japan. The was announced on Tuesday.

Watson has already been trained in Japanese, so now it’s matter of getting its capabilities into production via specialized systems, apps or even robots running Watson APIs. As in the United States, early focus areas include education, banking, health care, insurance and retail.

[company]IBM[/company] has had a somewhat difficult time selling Watson, so maybe the Japanese market will help the company figure out why. It could be that the technology doesn’t work as well or as easily as advertised, or it could just be that American companies, developers and consumers aren’t ready to embrace so many natural-language-powered applications.

The deal with SoftBank isn’t the first time IBM has worked to teach a computer Japanese. The company is also part of a project with several Japanese companies and agencies, called the Todai Robot, to build a system that runs on a laptop and can pass the University of Tokyo entrance exam.

We’ll be talking a lot about artificial intelligence and machine that can learn at our Structure Data conference in March, with speakers from Facebook, Spotify, Yahoo and other companies. In September, we’re hosting Gigaom’s inaugural Structure Intelligence conference, which will be all about AI.

Scientists say tweets predict heart disease and community health

University of Pennsylvania researchers have found that the words people use on Twitter can help predict the rate of heart disease deaths in the counties where they live. Places where people tweet happier language about happier topics show lower rates of heart disease death when compared with Centers for Disease Control statistics, while places with angry language about negative topics show higher rates.

The findings of this study, which was published in the journal Psychological Science, cut across fields such as medicine, psychology, public health and possibly even civil planning. It’s yet another affirmation that Twitter, despite any inherent demographic biases, is a good source of relatively unfiltered data about people’s thoughts and feelings, well beyond the scale and depth of traditional polls or surveys. In this case, the researchers used approximately 148 million geo-tagged tweets from 2009 and 2010 from more than 1,300 counties that contain 88 percent of the U.S. population.

(How to take full advantage of this glut of data, especially for business and governments, is something we’ll cover at our Structure Data conference with Twitter’s Seth McGuire and Dataminr’s Ted Bailey.)


What’s more, at the county level, the Penn study’s findings about language sentiment turn out to be more predictive of heart disease than any other individual factor — including income, smoking and hypertension. A predictive model combining language with those other factors was the most accurate of all.

That’s a result similar to recent research comparing Google Flu Trends with CDC data. Although it’s worth noting that Flu Trends is an ongoing project that has already been collecting data for years, and that the search queries it’s collecting are much more directly related to influenza than the Penn study’s tweets are to heart disease.

That’s likely why the Penn researchers suspect their findings will be more relevant to community-scale policies or interventions than anything at an individual level, despite previous research that shows a link between emotional well-being and heart disease in individuals. Penn professor Lyle Ungar is quoted in a press release announcing the study’s publication:

“We believe that we are picking up more long-term characteristics of communities. The language may represent the ‘drying out of the wood’ rather than the ‘spark’ that immediately leads to mortality. We can’t predict the number of heart attacks a county will have in a given timeframe, but the language may reveal places to intervene.”

The researchers’ work is part of the university’s Well-Being Project, which has also used Facebook users’ language to build personality profiles.

map plot - FINAL

New from Watson: Financial advice and a hardcover cookbook

IBM has recruited a couple of new partners in its quest to mainstream its Watson cognitive computing system: financial investment specialist Vantage Software and the Institute of Culinary Education, or ICE. While the former is exactly the kind of use case one might expect from Watson, the latter seems like a pretty savvy marketing move.

What Vantage is doing with Watson, through a new software program called Coalesce, is about the same thing [company]IBM[/company] has been touting for years around the health care and legal professions. Only, replace health care and legal with financial services, and doctors and lawyers with financial advisers and investment managers. Coalesce will rely on Watson to analyze large amount of literature and market data, which will complement experts’ own research and possibly provide them with information or trends they otherwise might have missed.

The partnership with the culinary institute, though — on a hardcover cookbook — is much more interesting. It’s actually a tangible manifestation of work that IBM and ICE have been doing together for a few years. At last year’s South By Southwest event, in fact, Gigaom’s Stacey Higginbotham ate a meal from an IBM food truck with ingredients suggested by Watson and prepared by ICE chefs.

Source: I

The IBM food truck.

But even if the cookbook doesn’t sell (although I will buy one when it’s released in April and promise to review at least a few recipes), it’s a good way to try and convince the world that Watson has promise beyond just fighting cancer. IBM is banking on cognitive computing (aka artificial intelligence) to become a multi-billion-dollar business, so it’s going to need more than a handful of high-profile users. It has already started down this path with its Watson cloud ecosystem and APIs, where partners have built applications for things including retail recommendations, travel and cybersecurity.

Watson isn’t IBM’s only investment in artificial intelligence, either. Our Structure Data conference in March will feature Dharmendra Modha, the IBM researcher who led development of the company’s SyNAPSE chip that’s modeled on the brain and designed to learn like a neural network while consuming just a fraction of the power normal microchips do.

However, although we’re on the cusp of an era of smart applications and smart devices, we’re also in an era of on-demand cloud computing and a user base that cut its teeth on Google’s product design. The competition over the next few years — and there will be lots of it — won’t just be about who has most-accurate text analysis or computer vision models, or who executes the best publicity stunts.

All the cookbooks and research projects in the world will amount to a lot of wasted time if IBM can’t deliver with artificial intelligence products and services that people actually want to use.

AI startup Expect Labs raises $13M as voice search API takes off

There’s more to speech recognition apps than Siri, Cortana or Google voice search, and a San Francisco startup called Expect Labs aims to prove it. On Thursday, the company announced it has raised a $13 million Series A round of venture capital led by IDG Ventures and USAA, with participation from strategic investors including Samsung, Intel and Telefonica. The company has now raised $15.5 million since launching in late 2012.

Expect Labs started out by building an application called MindMeld that lets users carry on voice conversations and automatically surfaces related content from around the web as they speak. However, that was just a proving ground for what is now the company’s primary business — its MindMeld API. The company released the API in February 2014, and has since rolled out specific modules for media and ecommerce recommendations.

Here’s how the API works, as I described at its launch:

[blockquote person=”” attribution=””]The key to the MindMeld API is its ability (well, the ability of the system behind it) to account for context. The API will index and make a knowledge graph from a website, database or content collection, but then it also collects contextual clues from an application’s users about where they are, what they’re doing or what they’re typing, for example. It’s that context that lets the API decide which search results to display or content to recommend, and when.[/blockquote]

Tim Tuttle (left) at Structure Data 2014.

Tim Tuttle (left) at Structure Data 2014.

API users don’t actually have to incorporate speech recognition into their apps, and initially many didn’t, but that’s starting to change, said Expect Labs co-founder and CEO Tim Tuttle. There are about a thousand developers building on the API right now, and the vast improvements in speech recognition over the past several months alone has helped pique their interest in voice.

Around the second quarter of next year, he said, “You’re going to see some very cool, very accurate voice apps start to appear.”

He doesn’t think every application is ideal for a voice interface, but he does think it’s ideal for those situations where people need to sort through a large number of choices. “If you get voice right … it can actually be much, much faster to help users find what they need,” he explained, because it’s easier and faster to refine searches when you don’t have to think about what to type and actually type it.

A demo of MindMeld voice search, in which I learned Loren Avedon plays a kickboxer in more than one movie.

A demo of MindMeld voice search, in which I learned Loren Avedon plays a kickboxer in more than one movie.

Of course, that type of experience requires more than just speech recognition, it also requires the natural language processing and indexing capabilities that are Expect Labs’ bread and butter. Tuttle cited some big breakthroughs in those areas over the past couple of years, as well, and said one of his company’s big challenges is keeping up with those advances as they scale from words up to paragraphs of text. It needs to understand the state of the art, and also be able to hone in the sweet spot for voice interfaces that probably lies somewhere between them.

“People are still trying to figure out what the logical unit of the human brain is and replicate that,” he said.

Check out Tuttle’s session at Structure Data 2014 below. Structure Data 2015 takes place March 18-19 in New York, and covers all things data, from Hadoop to quantum computing, and from BuzzFeed to crime prediction.


What we read about deep learning is just the tip of the iceberg

The artificial intelligence technique known as deep learning is white hot right now, as we have noted numerous times before. It’s powering many of the advances in computer vision, voice recognition and text analysis at companies including Google, Facebook, Microsoft and Baidu, and has been the technological foundation of many startups (some of which were acquired before even releasing a product). As far as machine learning goes, these public successes receive a lot of media attention.

But they’re only the public face of a field that appears to be growing like mad beneath the surface. So much research is happening at places that are not large web companies, and even most of the large web companies’ work goes unreported. Big breakthroughs and ImageNet records get the attention, but there’s progress being made all the time.

Just recently, for example, Google’s DeepMind team reported on initial efforts to build algorithm-creating systems that it calls “Neural Turing Machines”; Facebook showed off a “generic” 3D feature for analyzing videos; and Microsoft researchers concluded that quantum computing could prove a boon for certain types of deep learning algorithms.

We’ll talk more about some of these efforts at our Structure Data conference in March, where speakers include a senior researcher from Facebook’s AI lab, as well as prominent AI and robotics researchers from labs at Stanford and MIT.


But anyone who really needs to know what’s happening in deep learning was probably at the Neural Information Processing Systems, or NIPS, conference that happened last week in Montreal, Quebec. It’s a long-running conference that’s increasingly dominated by deep learning. Of the 411 papers accepted to this year’s conference, 46 of them included the word “deep” among their 100 most-used words (according to a topic model by Stanford Ph.D. student Andrej Karpathy). That doubles last year’s number of 23, which itself was 65 percent more than the 15 in 2012.

At the separate deep learning workshop co-located with the NIPS conference, the number of poster presentations this year shot up to 47 from last year’s 28. While some of the bigger research breakthroughs presented at NIPS have already been written about (e.g., the combination of two types of neural networks to automatically produce image captions, research on which Karpathy worked), other potentially important work goes largely unnoticed by the general public.

Yoshua Bengio — a University of Montreal researcher well known in deep learning circles, who has so far resisted the glamour of corporate research labs — and his team appear very busy. Bengio is listed as a coauthor on five of this year’s NIPS papers, and another seven at the workshop, but his name doesn’t often come up in stories about Skype Translate or Facebook trying to check users posting drunken photos.


In a recent TEDx talk, Enlitic CEO Jeremy Howard talked about advances in translation and medical imaging that have flown largely under the radar, and also showed off how software like the stuff his company is building could help doctors train computers to classify medical images in just minutes.

The point here is not just to say, “Wow! Look how much research is happening.” Nor is it to warn of an impending AI takeover of humanity. It’s just a heads-up that there’s a lot going on underneath the surface that goes largely underreported by the press, but of which certain types of people should try to keep abreast nonetheless.

Lawmakers, national security agents, ethicists and economists (Howard touches on the economy in that TEDx talk and elaborates in a recent Reddit Ask Me Anything session) need to be aware of what’s happening and what’s possible if our foundational institutions are going to be prepared for the effects of machine intelligence, however it’s defined. (In another field of AI research, Paul Allen is pumping money into projects that are trying to give computers actual knowledge.)

Some example results of Stanford's system. Source: Andrej Karpathy and Li Fei-Fei / Stanford

Results of Karpathy’s research on image captions. Beyond automating that process, imagine combing through millions of unlabeled images to learn about what’s happening in them.

But CEOs, product designers and other business types also need to be aware. We’re seeing a glut of companies claiming they can analyze the heck out of images, text and databases, and others delivering capabilities such as voice interaction and voice search as a service. Even research firm IDC is predicting video, audio and image analytics “will at least triple in 2015 and emerge as the key driver for [big data and analytics] technology investment.”

Smart companies investing in these technologies will see deep learning as much more than a way to automatically tag images for search or analyze sentiment. They’ll see it as a way to learn a whole lot more about their businesses and the customers buying their products.

In deep learning, especially, we’re talking about a field where operational systems exist, techniques are being democratized rapidly and research appears to be increasing exponentially. It’s not just a computer science project anymore; two and a half years later, jokes about Google’s cat-recognizing computers already seem dated.

With $8M and star team, MetaMind does deep learning for enterprise

A Palo Alto startup called MetaMind launched on Friday promising to help enterprises use deep learning to analyze their images, text and other data. The company has raised $8 million from Khosla Ventures and Marc Benioff, and and Khosla operating partner and CTO Sven Strohband is its co-founder and CEO. He’s joined by co-founder and CTO Richard Socher — a frequently published researcher — and a small team of other data scientists.

Natural language processing expert Chris Manning of Stanford and Yoshua Bengio of the University of Montreal, considered one of the handful of deep learning masters, are MetaMind’s advisers.

Rather than trying to help companies deploy and train their own deep neural networks and artificial intelligence systems, as some other startups are doing, MetaMind is providing simple interfaces for predetermined tasks. Strohband thinks a lot of users will ultimately care less about the technology underneath and more about what it can do for them.

“I think people, in the end, are trying to solve a problem,” he said.

Sven Strohband (second from left) at Structure Data 2014.

Sven Strohband (second from left) at Structure Data 2014.

Right now, there are several tools (what the company calls “smart modules”) for computer vision — including image, localization and segmentation — as well as for language. The latter, where much of Socher’s research has focused, includes modules for text classification, sentiment analysis and question-answering, among other things. (MetaMind incorporates a faster, more accurate version of the etcML text-analysis service that Socher helped create while pursuing a Ph.D. at Stanford.)

During a briefing on MetaMind, Socher demonstrated a capability that merges language and vision and that’s similar, inversely, to a spate of recent work from Google, Stanford and elsewhere around automatically generating detailed captions for images. When he typed in phrases such as “birds on water” or “horse with bald man,” the application surfaced pictures fitting those descriptions and even clustered them based on how similar they are.

Testing out MetaMind's sentiment analysis for Twitter.

Testing out MetaMind’s sentiment analysis for Twitter

Socher and Strohband claim MetaMind’s accuracy in language and vision tasks is comparable to, if not better than, previous systems that have won competitions in those fields. Where applicable, the company’s website shows these comparisons.

MetaMind is also working on modules for reasoning over databases, claiming the ability to automatically fill in missing values and predict column headings. Demo versions of several of these features are available on the company’s website, including a couple that let users import their own text or images and train their own classifiers. Socher calls this “drag-and-drop deep learning.”

The bare image-training interface.

The bare image-training interface

On the surface, the MetaMind service seems similar to those of a couple other deep-learning-based startups, including computer-vision specialist Clarifai but especially AlchemyAPI, which is rapidly expanding its collection of services. If there’s a big difference on the product side right now, it’s that AlchemyAPI has been around for years and has a fairly standard API-based cloud service, and a business model that seems to work for it.

After being trained on 5 pics of chocolate chip cookies and five pics of oatmeal raisin cookies, I tested it on this one.

After being trained on five pictures of chocolate chip cookies and five pictures of oatmeal raisin cookies, I tested it on this one.

MetaMind is only four months old, but Strohband said the company plans to keep expanding its capabilities and become a general-purpose artificial intelligence platform. It intends to make money by licensing its modules to enterprise users along with commercial support. However, it does offer some free tools and an API in order to get the technology in front of a lot of users to gin up excitement and learn from what they’re doing.

“Making these tools so easy to use will open up a lot of interesting use cases,” Socher said.

Asked about the prospect of acquiring skilled researchers and engineers in a field where hiring is notoriously difficult — and in a geography, Palo Alto, where companies like [company]Google[/company] and [company]Facebook[/company] are stockpiling AI experts — Socher suggested it’s not quite as hard as it might seem. Companies like MetaMind just need to look a little outside the box.

“If [someone is] incredibly good at applied math programming … I can teach that person a lot about deep learning in a very short amount of time,” he said.

He thinks another important element, if MetaMind is to be successful, will be for him to continue doing his own research so the company can develop its own techniques and remain on the cutting edge. That’s increasingly difficult in the world of deep learning and neural network research, where large companies are spending hundreds of millions of dollars, universities are doubling down and new papers are published seemingly daily.

“If you rest a little on your laurels here,” Strohband said, “this field moves so fast [you’ll get left behind].”

Allen Foundation gives millions to teach machines common sense

The Paul G. Allen Foundation announced on Wednesday that it has awarded $5.7 million in grants to five projects that aim to teach machines to understand what they see and read. That can be anything from a photograph to a chart, a diagram to an entire textbook.

IBM Watson invests in personal health company

IBM’s Watson group has invested an undisclosed amount of money in Pathway Genomics to help the company deliver an app that gives personalized advice to users based on their genetic information, as well as data collected by wearable devices, medical records and other sources. Like all Watson-powered apps, users will query the new app, called Panorama, using natural language and results are delivered based on analysis from sources including medical literature and clinical trials. So far, IBM has invested in a handful of companies building apps on Watson, including in the retail and health care spaces.