Scientists say tweets predict heart disease and community health

University of Pennsylvania researchers have found that the words people use on Twitter can help predict the rate of heart disease deaths in the counties where they live. Places where people tweet happier language about happier topics show lower rates of heart disease death when compared with Centers for Disease Control statistics, while places with angry language about negative topics show higher rates.

The findings of this study, which was published in the journal Psychological Science, cut across fields such as medicine, psychology, public health and possibly even civil planning. It’s yet another affirmation that Twitter, despite any inherent demographic biases, is a good source of relatively unfiltered data about people’s thoughts and feelings, well beyond the scale and depth of traditional polls or surveys. In this case, the researchers used approximately 148 million geo-tagged tweets from 2009 and 2010 from more than 1,300 counties that contain 88 percent of the U.S. population.

(How to take full advantage of this glut of data, especially for business and governments, is something we’ll cover at our Structure Data conference with Twitter’s Seth McGuire and Dataminr’s Ted Bailey.)


What’s more, at the county level, the Penn study’s findings about language sentiment turn out to be more predictive of heart disease than any other individual factor — including income, smoking and hypertension. A predictive model combining language with those other factors was the most accurate of all.

That’s a result similar to recent research comparing Google Flu Trends with CDC data. Although it’s worth noting that Flu Trends is an ongoing project that has already been collecting data for years, and that the search queries it’s collecting are much more directly related to influenza than the Penn study’s tweets are to heart disease.

That’s likely why the Penn researchers suspect their findings will be more relevant to community-scale policies or interventions than anything at an individual level, despite previous research that shows a link between emotional well-being and heart disease in individuals. Penn professor Lyle Ungar is quoted in a press release announcing the study’s publication:

“We believe that we are picking up more long-term characteristics of communities. The language may represent the ‘drying out of the wood’ rather than the ‘spark’ that immediately leads to mortality. We can’t predict the number of heart attacks a county will have in a given timeframe, but the language may reveal places to intervene.”

The researchers’ work is part of the university’s Well-Being Project, which has also used Facebook users’ language to build personality profiles.

map plot - FINAL

New from Watson: Financial advice and a hardcover cookbook

IBM has recruited a couple of new partners in its quest to mainstream its Watson cognitive computing system: financial investment specialist Vantage Software and the Institute of Culinary Education, or ICE. While the former is exactly the kind of use case one might expect from Watson, the latter seems like a pretty savvy marketing move.

What Vantage is doing with Watson, through a new software program called Coalesce, is about the same thing [company]IBM[/company] has been touting for years around the health care and legal professions. Only, replace health care and legal with financial services, and doctors and lawyers with financial advisers and investment managers. Coalesce will rely on Watson to analyze large amount of literature and market data, which will complement experts’ own research and possibly provide them with information or trends they otherwise might have missed.

The partnership with the culinary institute, though — on a hardcover cookbook — is much more interesting. It’s actually a tangible manifestation of work that IBM and ICE have been doing together for a few years. At last year’s South By Southwest event, in fact, Gigaom’s Stacey Higginbotham ate a meal from an IBM food truck with ingredients suggested by Watson and prepared by ICE chefs.

Source: I

The IBM food truck.

But even if the cookbook doesn’t sell (although I will buy one when it’s released in April and promise to review at least a few recipes), it’s a good way to try and convince the world that Watson has promise beyond just fighting cancer. IBM is banking on cognitive computing (aka artificial intelligence) to become a multi-billion-dollar business, so it’s going to need more than a handful of high-profile users. It has already started down this path with its Watson cloud ecosystem and APIs, where partners have built applications for things including retail recommendations, travel and cybersecurity.

Watson isn’t IBM’s only investment in artificial intelligence, either. Our Structure Data conference in March will feature Dharmendra Modha, the IBM researcher who led development of the company’s SyNAPSE chip that’s modeled on the brain and designed to learn like a neural network while consuming just a fraction of the power normal microchips do.

However, although we’re on the cusp of an era of smart applications and smart devices, we’re also in an era of on-demand cloud computing and a user base that cut its teeth on Google’s product design. The competition over the next few years — and there will be lots of it — won’t just be about who has most-accurate text analysis or computer vision models, or who executes the best publicity stunts.

All the cookbooks and research projects in the world will amount to a lot of wasted time if IBM can’t deliver with artificial intelligence products and services that people actually want to use.

Facebook acquires speech-recognition IoT startup Wit.AI

Facebook has acquired Wit.AI, a San Francisco-based startup building a speech-recognition platform for the internet of things. The company launched early in 2014 raised a $3 million seed round in October from a group of investors including Andreessen Horowitz, Ignition Partners and NEA.

Wit.AI has about 6,000 developers on its platform, which allows users to program speech-recognition controls into their devices and deliver the capabilities via API. When I spoke with co-founder Alex Lebrun in May, he explained that his ultimate goal is to power artificially intelligent personalities like those in the move Her, but the company’s present focus is on helping power devices that can respond to simple voice commands. At the time, he said, Wit.AI was working with SmartThings on its line of connected devices, and was in talks with Nest before the Google acquisition.

Here’s how Wit.AI characterized its decision to join Facebook in a blog post announcing the deal:

[blockquote person=”” attribution=””]Facebook has the resources and talent to help us take the next step. Facebook’s mission is to connect everyone and build amazing experiences for the over 1.3 billion people on the platform – technology that understands natural language is a big part of that, and we think we can help.

The platform will remain open and become entirely free for everyone.[/blockquote]

For Facebook, acquiring Wit.AI gives it another opportunity to expand its platform into the world of connected devices and even smart homes without relying on speech-recognition technology developed by often-competitive companies. Much like Amazon has its Echo device, and Google has both the Android ecosystem and the Nest division, Facebook, too, likely wants a way to let users touch it when neither a keyboard nor a conventional computing device is around.

And, as a reader pointed out to me on Twitter, that probably happens a lot more in the developing world where Facebook expects to grow a lot in the years to come.

AI startup Expect Labs raises $13M as voice search API takes off

There’s more to speech recognition apps than Siri, Cortana or Google voice search, and a San Francisco startup called Expect Labs aims to prove it. On Thursday, the company announced it has raised a $13 million Series A round of venture capital led by IDG Ventures and USAA, with participation from strategic investors including Samsung, Intel and Telefonica. The company has now raised $15.5 million since launching in late 2012.

Expect Labs started out by building an application called MindMeld that lets users carry on voice conversations and automatically surfaces related content from around the web as they speak. However, that was just a proving ground for what is now the company’s primary business — its MindMeld API. The company released the API in February 2014, and has since rolled out specific modules for media and ecommerce recommendations.

Here’s how the API works, as I described at its launch:

[blockquote person=”” attribution=””]The key to the MindMeld API is its ability (well, the ability of the system behind it) to account for context. The API will index and make a knowledge graph from a website, database or content collection, but then it also collects contextual clues from an application’s users about where they are, what they’re doing or what they’re typing, for example. It’s that context that lets the API decide which search results to display or content to recommend, and when.[/blockquote]

Tim Tuttle (left) at Structure Data 2014.

Tim Tuttle (left) at Structure Data 2014.

API users don’t actually have to incorporate speech recognition into their apps, and initially many didn’t, but that’s starting to change, said Expect Labs co-founder and CEO Tim Tuttle. There are about a thousand developers building on the API right now, and the vast improvements in speech recognition over the past several months alone has helped pique their interest in voice.

Around the second quarter of next year, he said, “You’re going to see some very cool, very accurate voice apps start to appear.”

He doesn’t think every application is ideal for a voice interface, but he does think it’s ideal for those situations where people need to sort through a large number of choices. “If you get voice right … it can actually be much, much faster to help users find what they need,” he explained, because it’s easier and faster to refine searches when you don’t have to think about what to type and actually type it.

A demo of MindMeld voice search, in which I learned Loren Avedon plays a kickboxer in more than one movie.

A demo of MindMeld voice search, in which I learned Loren Avedon plays a kickboxer in more than one movie.

Of course, that type of experience requires more than just speech recognition, it also requires the natural language processing and indexing capabilities that are Expect Labs’ bread and butter. Tuttle cited some big breakthroughs in those areas over the past couple of years, as well, and said one of his company’s big challenges is keeping up with those advances as they scale from words up to paragraphs of text. It needs to understand the state of the art, and also be able to hone in the sweet spot for voice interfaces that probably lies somewhere between them.

“People are still trying to figure out what the logical unit of the human brain is and replicate that,” he said.

Check out Tuttle’s session at Structure Data 2014 below. Structure Data 2015 takes place March 18-19 in New York, and covers all things data, from Hadoop to quantum computing, and from BuzzFeed to crime prediction.


IBM’s Watson is now studying PTSD in veterans

The Department of Veterans’ Affairs is working with IBM to analyze hundreds of thousands of VA hospital medical records using the Watson cognitive computing system. Improving the diagnosis and treatment of Post-Traumatic Stress Disorder is among the areas on which the partnership will focus. More broadly, though, the VA system might be the ideal place to explore what Watson and other artificial intelligence technologies can do. They have the potential to improve patient care without blowing up tight budgets, by improving speed and efficiency rather than increasing staff count.

What we read about deep learning is just the tip of the iceberg

The artificial intelligence technique known as deep learning is white hot right now, as we have noted numerous times before. It’s powering many of the advances in computer vision, voice recognition and text analysis at companies including Google, Facebook, Microsoft and Baidu, and has been the technological foundation of many startups (some of which were acquired before even releasing a product). As far as machine learning goes, these public successes receive a lot of media attention.

But they’re only the public face of a field that appears to be growing like mad beneath the surface. So much research is happening at places that are not large web companies, and even most of the large web companies’ work goes unreported. Big breakthroughs and ImageNet records get the attention, but there’s progress being made all the time.

Just recently, for example, Google’s DeepMind team reported on initial efforts to build algorithm-creating systems that it calls “Neural Turing Machines”; Facebook showed off a “generic” 3D feature for analyzing videos; and Microsoft researchers concluded that quantum computing could prove a boon for certain types of deep learning algorithms.

We’ll talk more about some of these efforts at our Structure Data conference in March, where speakers include a senior researcher from Facebook’s AI lab, as well as prominent AI and robotics researchers from labs at Stanford and MIT.


But anyone who really needs to know what’s happening in deep learning was probably at the Neural Information Processing Systems, or NIPS, conference that happened last week in Montreal, Quebec. It’s a long-running conference that’s increasingly dominated by deep learning. Of the 411 papers accepted to this year’s conference, 46 of them included the word “deep” among their 100 most-used words (according to a topic model by Stanford Ph.D. student Andrej Karpathy). That doubles last year’s number of 23, which itself was 65 percent more than the 15 in 2012.

At the separate deep learning workshop co-located with the NIPS conference, the number of poster presentations this year shot up to 47 from last year’s 28. While some of the bigger research breakthroughs presented at NIPS have already been written about (e.g., the combination of two types of neural networks to automatically produce image captions, research on which Karpathy worked), other potentially important work goes largely unnoticed by the general public.

Yoshua Bengio — a University of Montreal researcher well known in deep learning circles, who has so far resisted the glamour of corporate research labs — and his team appear very busy. Bengio is listed as a coauthor on five of this year’s NIPS papers, and another seven at the workshop, but his name doesn’t often come up in stories about Skype Translate or Facebook trying to check users posting drunken photos.


In a recent TEDx talk, Enlitic CEO Jeremy Howard talked about advances in translation and medical imaging that have flown largely under the radar, and also showed off how software like the stuff his company is building could help doctors train computers to classify medical images in just minutes.

The point here is not just to say, “Wow! Look how much research is happening.” Nor is it to warn of an impending AI takeover of humanity. It’s just a heads-up that there’s a lot going on underneath the surface that goes largely underreported by the press, but of which certain types of people should try to keep abreast nonetheless.

Lawmakers, national security agents, ethicists and economists (Howard touches on the economy in that TEDx talk and elaborates in a recent Reddit Ask Me Anything session) need to be aware of what’s happening and what’s possible if our foundational institutions are going to be prepared for the effects of machine intelligence, however it’s defined. (In another field of AI research, Paul Allen is pumping money into projects that are trying to give computers actual knowledge.)

Some example results of Stanford's system. Source: Andrej Karpathy and Li Fei-Fei / Stanford

Results of Karpathy’s research on image captions. Beyond automating that process, imagine combing through millions of unlabeled images to learn about what’s happening in them.

But CEOs, product designers and other business types also need to be aware. We’re seeing a glut of companies claiming they can analyze the heck out of images, text and databases, and others delivering capabilities such as voice interaction and voice search as a service. Even research firm IDC is predicting video, audio and image analytics “will at least triple in 2015 and emerge as the key driver for [big data and analytics] technology investment.”

Smart companies investing in these technologies will see deep learning as much more than a way to automatically tag images for search or analyze sentiment. They’ll see it as a way to learn a whole lot more about their businesses and the customers buying their products.

In deep learning, especially, we’re talking about a field where operational systems exist, techniques are being democratized rapidly and research appears to be increasing exponentially. It’s not just a computer science project anymore; two and a half years later, jokes about Google’s cat-recognizing computers already seem dated.

With $8M and star team, MetaMind does deep learning for enterprise

A Palo Alto startup called MetaMind launched on Friday promising to help enterprises use deep learning to analyze their images, text and other data. The company has raised $8 million from Khosla Ventures and Marc Benioff, and and Khosla operating partner and CTO Sven Strohband is its co-founder and CEO. He’s joined by co-founder and CTO Richard Socher — a frequently published researcher — and a small team of other data scientists.

Natural language processing expert Chris Manning of Stanford and Yoshua Bengio of the University of Montreal, considered one of the handful of deep learning masters, are MetaMind’s advisers.

Rather than trying to help companies deploy and train their own deep neural networks and artificial intelligence systems, as some other startups are doing, MetaMind is providing simple interfaces for predetermined tasks. Strohband thinks a lot of users will ultimately care less about the technology underneath and more about what it can do for them.

“I think people, in the end, are trying to solve a problem,” he said.

Sven Strohband (second from left) at Structure Data 2014.

Sven Strohband (second from left) at Structure Data 2014.

Right now, there are several tools (what the company calls “smart modules”) for computer vision — including image, localization and segmentation — as well as for language. The latter, where much of Socher’s research has focused, includes modules for text classification, sentiment analysis and question-answering, among other things. (MetaMind incorporates a faster, more accurate version of the etcML text-analysis service that Socher helped create while pursuing a Ph.D. at Stanford.)

During a briefing on MetaMind, Socher demonstrated a capability that merges language and vision and that’s similar, inversely, to a spate of recent work from Google, Stanford and elsewhere around automatically generating detailed captions for images. When he typed in phrases such as “birds on water” or “horse with bald man,” the application surfaced pictures fitting those descriptions and even clustered them based on how similar they are.

Testing out MetaMind's sentiment analysis for Twitter.

Testing out MetaMind’s sentiment analysis for Twitter

Socher and Strohband claim MetaMind’s accuracy in language and vision tasks is comparable to, if not better than, previous systems that have won competitions in those fields. Where applicable, the company’s website shows these comparisons.

MetaMind is also working on modules for reasoning over databases, claiming the ability to automatically fill in missing values and predict column headings. Demo versions of several of these features are available on the company’s website, including a couple that let users import their own text or images and train their own classifiers. Socher calls this “drag-and-drop deep learning.”

The bare image-training interface.

The bare image-training interface

On the surface, the MetaMind service seems similar to those of a couple other deep-learning-based startups, including computer-vision specialist Clarifai but especially AlchemyAPI, which is rapidly expanding its collection of services. If there’s a big difference on the product side right now, it’s that AlchemyAPI has been around for years and has a fairly standard API-based cloud service, and a business model that seems to work for it.

After being trained on 5 pics of chocolate chip cookies and five pics of oatmeal raisin cookies, I tested it on this one.

After being trained on five pictures of chocolate chip cookies and five pictures of oatmeal raisin cookies, I tested it on this one.

MetaMind is only four months old, but Strohband said the company plans to keep expanding its capabilities and become a general-purpose artificial intelligence platform. It intends to make money by licensing its modules to enterprise users along with commercial support. However, it does offer some free tools and an API in order to get the technology in front of a lot of users to gin up excitement and learn from what they’re doing.

“Making these tools so easy to use will open up a lot of interesting use cases,” Socher said.

Asked about the prospect of acquiring skilled researchers and engineers in a field where hiring is notoriously difficult — and in a geography, Palo Alto, where companies like [company]Google[/company] and [company]Facebook[/company] are stockpiling AI experts — Socher suggested it’s not quite as hard as it might seem. Companies like MetaMind just need to look a little outside the box.

“If [someone is] incredibly good at applied math programming … I can teach that person a lot about deep learning in a very short amount of time,” he said.

He thinks another important element, if MetaMind is to be successful, will be for him to continue doing his own research so the company can develop its own techniques and remain on the cutting edge. That’s increasingly difficult in the world of deep learning and neural network research, where large companies are spending hundreds of millions of dollars, universities are doubling down and new papers are published seemingly daily.

“If you rest a little on your laurels here,” Strohband said, “this field moves so fast [you’ll get left behind].”

Why Nuance is pushing for a new type of Turing test

Nuance Communications is sponsoring a contest called the Winograd Schema challenge that aims to replace the usual conversation-based attempts to pass the Turing test. According to its backers, the Winograd challenge places more emphasis on intelligence and less on trickery.