Google, Stanford say big data is key to deep learning for drug discovery

A team of researchers from Stanford University and Google have released a paper highlighting a deep learning approach they say shows promise in the field of drug discovery. What they found, essentially, is that that more data covering more biological processes seems like a good recipe for uncovering new drugs.

Importantly, the paper doesn’t claim a major breakthrough that will revolutionize the pharmaceutical industry today. It simply shows that by analyzing a whole lot of data across a whole lot of different target processes — in this case, 37.8 million data points across 259 tasks — seems to work measurably better for discovering possible drugs than does analyzing smaller datasets and/or building models specifically targeting a single a task. (Read the Google blog post for a higher-level, but still very-condensed explanation.)

But when talking about a process in drug discovery that can take years and cost drug companies billions of dollars that ultimately make their way into the prices of prescription drugs, any small improvement helps.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

Here’s how the researchers explain the reality, and the promise, of their work in the paper:

The efficacy of multitask learning is directly related to the availability of relevant data. Hence, obtaining greater amounts of data is of critical importance for improving the state of the art. Major pharmaceutical companies possess vast private stores of experimental measurements; our work provides a strong argument that increased data sharing could result in benefits for all.

More data will maximize the benefits achievable using current architectures, but in order for algorithmic progress to occur, it must be possible to judge the performance of proposed models against previous work. It is disappointing to note that all published applications of deep learning to virtual screening (that we are aware of) use distinct datasets that are not directly comparable. It remains to future research to establish standard datasets and performance metrics for this field.

. . .

Although deep learning offers interesting possibilities for virtual screening, the full drug discovery process remains immensely complicated. Can deep learning—coupled with large amounts of experimental data—trigger a revolution in this field? Considering the transformational effect that these methods have had on other fields, we are optimistic about the future.

If they’re right, we might look back on this research as part of a handful of efforts that helped spur an artificial intelligence revolution in the health care space. Aside from other research in the field, there are multiple startups, including Butterfly Network and Enlitic (which will be presenting at our Structure Data conference later this month in New York) trying to improve doctors’ ability to diagnose diseases using deep learning. Related efforts include the work IBM is doing with its Watson technology to analyze everything from cancer to PTSD, as well as from startups like Ayasdi and Lumiata.

There’s no reason that researchers have to stop here, either. Deep learning has proven remarkably good at tackling machine perception tasks such as computer vision and speech recognition, but the approach can technically excel at more general problems involving pattern recognition and feature selection. Given the right datasets, we could soon see deep learning networks identifying environmental factors and other root causes of disease that would help public health officials address certain issues so doctors don’t have to.

IBM’s Watson is now studying PTSD in veterans

The Department of Veterans’ Affairs is working with IBM to analyze hundreds of thousands of VA hospital medical records using the Watson cognitive computing system. Improving the diagnosis and treatment of Post-Traumatic Stress Disorder is among the areas on which the partnership will focus. More broadly, though, the VA system might be the ideal place to explore what Watson and other artificial intelligence technologies can do. They have the potential to improve patient care without blowing up tight budgets, by improving speed and efficiency rather than increasing staff count.

Deep learning might help you get an ultrasound at Walgreens

A new startup called Butterfly Network, from genomic-technology pioneer Jonathan Rothberg, hopes to improve the world of medical imaging using advanced chip technologies, tablet devices and deep learning. Rothberg explains how and why deep learning is key to the company’s plans.

Box acquires medical-imaging startup MedXT

Cloud storage and collaboration provider Box has acquired MedXT, a startup that has built technology for storing and sharing medical images in the cloud. The acquisition is an early step toward bolstering the Box for Healthcare initiative the company recently launched. Box founder and CEO Aaron Levie announced the acquisition in a blog post, explaining that “the expertise of the MedXT team members and their medical image viewing technology will be incredibly important in our effort to deliver HIPAA-compliant sharing and collaboration for all critical content types.”

medxt-viewer-480x331x64-100ms

How machine learning is saving lives while saving hospitals money

Researchers at the University of Washington, Tacoma, have built a machine learning system capable of predicting readmission risks for congestive heart failure patients. It has shown good results in a pilot deployment, and now the team hopes to commercialize the technology.

Syapse raises $10M for its medical knowledge graph tech

Syapse, a startup trying to build something akin to Google’s Knowledge Graph for medical data, has raised a $10 million series B round of venture capital from Safeguard Scientific and existing investor Social+Capital Partnership. We covered Syapse when it launched in January 2013, promising to help doctors make sense of the myriad data sources and data points associated with medical tests, from how a sample was extracted to the method used for analyzing it.

BaseHealth launches a wellness platform that melds genomics and devices

A startup called BaseHealth launched on Tuesday with a mission to deliver personalized wellness plans while keeping doctors very much in the picture. The company’s platform combines genetic data, lifestyle data and medical records to determine patients’ risks and how they can mitigate them.