Google, Stanford say big data is key to deep learning for drug discovery

A team of researchers from Stanford University and Google have released a paper highlighting a deep learning approach they say shows promise in the field of drug discovery. What they found, essentially, is that that more data covering more biological processes seems like a good recipe for uncovering new drugs.

Importantly, the paper doesn’t claim a major breakthrough that will revolutionize the pharmaceutical industry today. It simply shows that by analyzing a whole lot of data across a whole lot of different target processes — in this case, 37.8 million data points across 259 tasks — seems to work measurably better for discovering possible drugs than does analyzing smaller datasets and/or building models specifically targeting a single a task. (Read the Google blog post for a higher-level, but still very-condensed explanation.)

But when talking about a process in drug discovery that can take years and cost drug companies billions of dollars that ultimately make their way into the prices of prescription drugs, any small improvement helps.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

This graph shows a measure of prediction accuracy (ROC AUC is the area under the receiver operating characteristic curve) for virtual screening on a fixed set of 10 biological processes as more datasets are added.

Here’s how the researchers explain the reality, and the promise, of their work in the paper:

The efficacy of multitask learning is directly related to the availability of relevant data. Hence, obtaining greater amounts of data is of critical importance for improving the state of the art. Major pharmaceutical companies possess vast private stores of experimental measurements; our work provides a strong argument that increased data sharing could result in benefits for all.

More data will maximize the benefits achievable using current architectures, but in order for algorithmic progress to occur, it must be possible to judge the performance of proposed models against previous work. It is disappointing to note that all published applications of deep learning to virtual screening (that we are aware of) use distinct datasets that are not directly comparable. It remains to future research to establish standard datasets and performance metrics for this field.

. . .

Although deep learning offers interesting possibilities for virtual screening, the full drug discovery process remains immensely complicated. Can deep learning—coupled with large amounts of experimental data—trigger a revolution in this field? Considering the transformational effect that these methods have had on other fields, we are optimistic about the future.

If they’re right, we might look back on this research as part of a handful of efforts that helped spur an artificial intelligence revolution in the health care space. Aside from other research in the field, there are multiple startups, including Butterfly Network and Enlitic (which will be presenting at our Structure Data conference later this month in New York) trying to improve doctors’ ability to diagnose diseases using deep learning. Related efforts include the work IBM is doing with its Watson technology to analyze everything from cancer to PTSD, as well as from startups like Ayasdi and Lumiata.

There’s no reason that researchers have to stop here, either. Deep learning has proven remarkably good at tackling machine perception tasks such as computer vision and speech recognition, but the approach can technically excel at more general problems involving pattern recognition and feature selection. Given the right datasets, we could soon see deep learning networks identifying environmental factors and other root causes of disease that would help public health officials address certain issues so doctors don’t have to.

IBM wants to protect our food by sequencing its supply chains

[company]IBM[/company] is teaming up with food conglomerate Mars to study, and hopefully protect, citizens from foodborne illnesses by sequencing the genes of the tiny organisms that populate our food chains. Consisting of just the two companies right now but expected to grow, the effort is being called, aptly, the Consortium for Sequencing the Food Supply Chain.

The companies will study the metagenomics — essentially, the metadata — of safe factories, farms, grocery stores and other areas in order to determine what’s normal, explained Jeff Wesler, the vice president and lab director IBM’s Almaden research center in San Jose, California. Eventually, the goal is to understand enough about normal, safe conditions that companies will be able to detect deviations early enough to prevent them from spurring an outbreak of salmonella, E. coli, or other dangerous bacteria or chemicals.

For example, he explained, although many places along the food supply chain know to test for salmonella, they might not know to test for, or have any reason to expect, contamination by other substances. Those could be anything from other, foreign bacteria to chemicals such as the melamine found in Chinese milk and infant formula in 2008. Understanding the metagenomics of the product being sold and the factories producing it means companies and regulatory agencies might be able to spot a problem in the microbial ecosystem and then get to work determining what’s causing it.

According to the Centers for Disease Control, foodborne illnesses sicken one in six Americans each year and kill about 3,000 people in the United States. A 2012 study published in the Journal of Food Prediction estimates the annual economic impact of of foodborne illness at nearly $78 billion.

Analytics Process_blue background

Right now, Wesler predicts it will be about three to five years before the results of this research might be deployed commercially. The companies will spend the first couple years getting to know the baselines microbiomes of various areas and things, understanding what they’re composed of and how they react to changes in environments or to other stuff. Initial research will focus on Mars facilities, which span a range of products including candy, pet food, packaged food and coffee.

Hopefully, they’ll be able to build up a database of connections between bacteria, chemicals, heavy metals and other substances, and their reactions in the presence of each other. After that, Wesler thinks they’ll be able to begin working on a set of tests that makes sense for particular industries and that can be implemented in a reasonably easy manner.

Wesler calls this the “quintessential big data problem” because it involves analyzing so much data and is only really possible to solve now because of advances in the required technology. In this case, that’s not just cheap data storage and new data-analysis tools, but also better genetic-sequencing technologies. “One of the reasons we even think this is feasible is because of the rise of next-generation sequencing,” he said.

A major uptick in the capabilities of any piece of the chain could speed up the research, he said, but the current state of the art should be capable to work within the predicted time frame.

Yale, Organovo team up to 3D print organs for transplants

Every day, an average of 18 people die waiting for an organ donation. The growing transplant wait list was 121,272 people long in 2013.

A potential solution is 3D printed organs. They’re not a reality quite yet, but laboratories are already busy 3D printing living tissue. Industry leader Organovo and Yale’s schools of medicine and engineering announced a partnership this week that will focus specifically on research into printing transplantable tissue, and we may see some working applications very soon.

A benefit of 3D printing organs is that they can be made from a patient’s own cells, which reduces the chances their body will reject an organ. Organovo’s 3D printers are actually pretty similar to an inkjet printer you might find on a desktop; instead of ink, they are loaded with living cells that are then printed layer by layer.

In the short term, Organovo and Yale might develop organs that assist a failing organ instead of replacing it altogether. Patients would then have a greater chance of surviving until a donated organ becomes available to them.

The two might also develop transplantable blood vessels, lung tissue and bone. Organovo has already printed experimental versions of all three.

Organovo began selling 3D printed tissue commercially just last month. Its first product is small pieces of liver tissue made for drug toxicity testing.

Omicia raises $6.8M for service that lets doctors analyze genomes for free

An Oakland-based startup called Omicia has raised a $6.8 million series A round of venture capital, led by Artis Ventures, for a cloud service that lets doctors analyze whole human genomes in order to identify the presence of diseases. The basic service is free (while more-advanced analyses and capabilities cost $99), and the whole process takes less than 3 hours for a whole genome (or less than 1 hour for an exome). Advances in sequencing, algorithms, data storage and cloud computing have been rapidly driving down the cost of genomic analysis over the past few years, leading to an uptick of of startup activity in the space.

Two new platforms for genomic analysis have raised $1.5M apiece

Two new platforms for storing and analyzing genomic data have raised venture capital recently, with Curoverse announcing $1.5 million in seed funding in mid-December and Tute Genomics announcing $1.5 million in seed funding on Dec. 31. Curoverse is a specialized private-cloud system, while Tute Genomics is a pure cloud service. Both are riding the waves of cheaper gene-sequencing costs, data storage and computing power, assuming it will result in a deluge of demand for genomic analysis over the next few years. They’re not alone: We’ve covered numerous startups trying to do the same thing, including DNAnexus, Bina Technologies, Spiral Genetics and Appistry.

Biotech startup Syapse wants to be Salesforce.com for our genomes

A startup called Syapse is trying to bring the world of “omics” — the study of all our genomes, biomes, proteomes and other “omes” — under control with a new data management platform based on some of the general techniques that also power Facebook’s Graph Search.