New to deep learning? Here are 4 easy lessons from Google

Google employs some of the world’s smartest researchers in deep learning and artificial intelligence, so it’s not a bad idea to listen to what they have to say about the space. One of those researchers, senior research scientist Greg Corrado, spoke at RE:WORK’s Deep Learning Summit on Thursday in San Francisco and gave some advice on when, why and how to use deep learning.

His talk was pragmatic and potentially very useful for folks who have heard about deep learning and how great it is — well, at computer vision, language understanding and speech recognition, at least — and are now wondering whether they should try using it for something. The TL;DR version is “maybe,” but here’s a little more nuanced advice from Corrado’s talk.

(And, of course, if you want to learn even more about deep learning, you can attend Gigaom’s Structure Data conference in March and our inaugural Structure Intelligence conference in September. You can also watch the presentations from our Future of AI meetup, which was held in late 2014.)

1. It’s not always necessary, even if it would work

Probably the most-useful piece of advice Corrado gave is that deep learning isn’t necessarily the best approach to solving a problem, even if it would offer the best results. Presently, it’s computationally expensive (in all meanings of the word), it often requires a lot of data (more on that later) and probably requires some in-house expertise if you’re building systems yourself.

So while deep learning might ultimately work well on pattern-recognition tasks on structured data — fraud detection, stock-market prediction or analyzing sales pipelines, for example — Corrado said it’s easier to justify in the areas where it’s already widely used. “In machine perception, deep learning is so much better than the second-best approach that it’s hard to argue with,” he explained, while the gap between deep learning and other options is not so great in other applications.

That being said, I found myself in multiple conversations at the event centered around the opportunity to soup up existing enterprise software markets with deep learning and met a few startups trying to do it. In an on-stage interview I did with Baidu’s Andrew Ng (who worked alongside Corrado on the Google Brain project) earlier in the day, he noted how deep learning is currently powering some ad serving at Baidu and suggested that data center operations (something Google is actually exploring) might be a good fit.

Greg Corrado

Greg Corrado

2. You don’t have to be Google to do it

Even when companies do decide to take on deep learning work, they don’t need to aim for systems as big as those at Google or Facebook or Baidu, Corrado said. “The answer is definitely not,” he reiterated. “. . . You only need an engine big enough for the rocket fuel available.”

The rocket analogy is a reference to something Ng said in our interview, explaining the tight relationship between systems design and data volume in deep learning environments. Corrado explained that Google needs a huge system because it’s working with huge volumes of data and needs to be able to move quickly as its research evolves. But if you know what you want to do or don’t have major time constraints, he said, smaller systems could work just fine.

For getting started, he added later, a desktop computer could actually work provided it has a sufficiently capable GPU.

3. But you probably need a lot of data

However, Corrado cautioned, it’s no joke that training deep learning models really does take a lot of data. Ideally as much as you can get yours hands on. If he’s advising executives on when they should consider deep learning, it pretty much comes down to (a) whether they’re trying to solve a machine perception problem and/or (b) whether they have “a mountain of data.”

If they don’t have a mountain of data, he might suggest they get one. At least 100 trainable observations per feature you want to train is a good start, he said, adding that it’s conceivable to waste months of effort trying to optimize a model that would have been solved a lot quicker if you had just spent some time gathering training data early on.

Corrado said he views his job not as building intelligent computers (artificial intelligence) or building computers that can learn (machine learning), but as building computers that can learn to be intelligent. And, he said, “You have to have a lot of data in order for that to work.”

Source: Google

Training a system that can do this takes a lot of data.

4. It’s not really based on the brain

Corrado received his Ph.D. in neuroscience and worked on IBM’s SyNAPSE neurosynaptic chip before coming to Google, and says he feels confident in saying that deep learning is only loosely based on how the brain works. And that’s based on what little we know about the brain to begin with.

Earlier in the day, Ng said about the same thing. To drive the point home, he noted that while many researchers believe we learn in an unsupervised manner, most production deep learning models today are still trained in a supervised manner. That is, they analyze lots of labeled images, speech samples or whatever in order to learn what it is.

And comparisons to the brain, while easier than nuanced explanations, tend to lead to overinflated connotations about what deep learning is or might be capable of. “This analogy,” Corrado said, “is now officially overhyped.”

Update: This post was updated on Feb. 2 to correct a statement about Corrado’s tenure at Google. He was with the company before Andrew Ng and the Google Brain project, and was not recruited by Ng to work on it, as originally reported.

Facebook open sources tools for bigger, faster deep learning models

Facebook on Friday open sourced a handful of software libraries that it claims will help users build bigger, faster deep learning models than existing tools allow.

The libraries, which [company]Facebook[/company] is calling modules, are alternatives for the default ones in a popular machine learning development environment called Torch, and are optimized to run on [company]Nvidia[/company] graphics processing units. Among the modules are those designed to rapidly speed up training for large computer vision systems (nearly 24 times, in some cases), to train systems on potentially millions of different classes (e.g., predicting whether a word will appear across a large number of documents, or whether a picture was taken in any city anywhere), and an optimized method for building language models and word embeddings (e.g., knowing how different words are related to each other).

“‘[T]here is no way you can use anything existing” to achieve some of these results, said Soumith Chintala, an engineer with Facebook Artificial Intelligence Research.

That team was formed in December 2013 when Facebook hired prominent New York University researcher Yann LeCun to run it. Rob Fergus, one of LeCun’s NYU colleagues who also joined Facebook at the same time, will be speaking on March 19 at our Structure Data conference in New York.

A heatmap showing performance of Facebook's modules to standard ones on datasets of various sizes. The darker the green, the faster Facebook was.

A heatmap showing performance of Facebook’s modules to standard ones on datasets of various sizes. The darker the green, the faster Facebook was.

Despite the sometimes significant improvements in speed and scale, however, the new Facebook modules probably are “not going to be super impactful in terms of today’s use cases,” Chintala said. While they might produce noticeable improvements within most companies’ or research teams’ deep learning environments, he explained, they’ll really make a difference (and justify making the switch) when more folks are working on stuff at a scale like Facebook is now — “using models that people [previously] thought were not possible.”

Perhaps the bigger and more important picture now, then, is that Friday’s open source releases represent the start of a broader Facebook effort to open up its deep learning research the way it has opened up its work on webscale software and data centers. “We are actually going to start building things in the open,” Chintala said, releasing a steady stream of code instead of just the occasional big breakthrough.

Facebook is also working fairly closely with Nvidia to rework some of its deep learning programming libraries to work at web scale, he added. Although it’s working at a scale beyond many mainstream deep learning efforts and its researchers change directions faster than would be feasible for a commercial vendor, Facebook’s advances could find their way into future releases of Nvidia’s libraries.

Given the excitement around deep learning right now — for everything from photo albums to self-driving cars — it’s a big deal that more and better open source code is becoming available. Facebook joins projects such as Torch (which it uses), Caffe and the Deeplearning4j framework being pushed by startup Skymind. Google has also been active in releasing certain tooling and datasets ideal for training models.

It was open source software that helped make general big data platforms, using software such as Hadoop and Kafka, a reality outside of cutting-edge web companies. Open source might help the same thing happen with deep learning, too — scaling it beyond the advances of leading labs at Facebook, Google, Baidu and Microsoft.

Skydio raises $3M to build software for safer, smarter drones

Skydio came out of stealth today with plans to build a smarter navigation system for drones, plus a $3 million seed round led by Andreessen Horowitz and Accel Partners.

Unlike most navigation systems that require a drone to have a GPS signal and dedicated human pilot, Skydio relies on computer vision to help drones see the world. A video on the Skydio website depicts drones flying around trees and through a parking lot, plus autonomously following people and being maneuvered by waving a mobile phone.

Skydio CEO Adam Bry wrote in a blog post:

A drone that’s aware of its surroundings is far easier to control, safer to operate, and more capable. Almost all the information a drone needs to be good at its job can be found in onboard video data; the challenge is extracting that information and making it useful for the task at hand. That challenge, and the incredible capabilities that are unlocked, are our focus.

For us this project is about harnessing the beauty and power of flight to make it “universally accessible and useful.”

Andreessen Horowitz general partner Chris Dixon wrote on his blog that Skydio is also poised to simplify drone programming to the point that it only takes a simple command.

“Smart drone operators will simply give high-level instructions like ‘map these fields’ or ‘film me while I’m skiing’ and the drone will carry out the mission,” Dixon wrote. “Safety and privacy regulations will be baked into the operating system and will always be the top priority.”

The Skydio team has roots at MIT, where two of its three co-founders worked on drone vision systems. They later went on to found the Project Wing delivery drone program at Google.

Andreessen Horowitz previously invested in another drone intelligence company: Airware. But Dixon doesn’t see the two companies as competitors.

“You can think of Airware as the operating system and Skydio as the most important app on top of the operating system,” Dixon wrote.

Baidu built a supercomputer for deep learning

Chinese search engine company Baidu says it has built the world’s most-accurate computer vision system, dubbed Deep Image, which runs on a supercomputer optimized for deep learning algorithms. Baidu claims a 5.98 percent error rate on the ImageNet object classification benchmark; a team from Google won the 2014 ImageNet competition with a 6.66 percent error rate.

In experiments, humans achieved an estimated error rate of 5.1 percent on the ImageNet dataset.

The star of Deep Image is almost certainly the supercomputer, called Minwa, which Baidu built to house the system. Deep learning researchers have long (well, for the past few years) used GPUs in order to handle the computational intensity of training their models. In fact, the Deep Image research paper cites a study showing that 12 GPUs in a 3-machine cluster can rival the performance of the performance of the 1,000-node CPU cluster behind the famous Google Brain project, on which Baidu Chief Scientist Andrew Ng worked.


But no one has yet built a system like this dedicated to the task of computer vision using deep learning. Here’s how paper author Ren Wu, a distinguished scientist at the Baidu Institute of Deep Learning, describes its specifications:

[blockquote person=”” attribution=””]It is comprised of 36 server nodes, each with 2 six-core Intel Xeon E5-2620 processors. Each sever contains 4 Nvidia Tesla K40m GPUs and one FDR InfiniBand (56Gb/s) which is a high-performance low-latency interconnection and supports RDMA. The peak single precision floating point performance of each GPU is 4.29TFlops and each GPU has 12GB of memory.

… In total, Minwa has 6.9TB host memory, 1.7TB device memory, and about 0.6 [petaflops] theoretical single precision peak performance.[/blockquote]

Sheer performance aside, Baidu built Minwa to help overcome problems associated with the types of algorithms on which Deep Image was trained. “Given the properties of stochastic gradient decent algorithms, it is desired to have very high bandwidth and ultra low latency interconnects to minimize the communication costs, which is needed for the distributed version of the algorithm,” the authors wrote.

A sample of the effects Baidu used to augment images.

A sample of the effects Baidu used to augment images.

Having such a powerful system also allowed the researchers to work with different, and arguably better, training data than most other deep learning projects. Rather than using the 256 x 256-pixel images commonly used, Baidu used higher-resolution images (512 x 512 pixels) and augmented them with various effects such as color-casting, vignetting and lens distortion. The goal was to let the system take in more features of smaller objects and to learn what objects look like without being thrown off by editing choices, lighting situations or other extraneous factors.

Baidu is investing heavily in deep learning, and Deep Image follows up a speech-recognition system called Deep Speech that the company made public in December. As executives there have noted before, including Ng at our recent Future of AI event in September, the company already sees a relatively high percentage of voice and image searches and expects that number to increase. The better its products can perform with real-world data (research datasets tend to be fairly optimal), the better the user experience will be.


However, Baidu do is far from the only company — especially on the web — investing significant resources into deep learning and getting impressive results. Google, which still holds the ImageNet record in the actual competition, is probably the company most associated with deep learning and this week unveiled new Google Translate features that likely utilize the technology. Microsoft and Facebook also have very well-respected deep learning researchers and continue to do cutting-edge research in the space while releasing products that use that research.

Yahoo, Twitter, Dropbox and other companies also have deep learning and computer vision teams in place.

Our Structure Data conference, which takes place in March, will include deep learning and machine learning experts from many organizations, including Facebook, Yahoo, NASA, IBM, Enlitic and Spotify.


DARPA-funded research IDs sex traffickers with machine learning

Carnegie Mellon University is touting a new $3.6 million research grant from the Defense Advanced Research Projects Agency, or DARPA, to build machine learning algorithms that can index online sex ads in order to identify sex traffickers. The research is part of a larger DARPA program called Memex that aims to index seedy portions of the public web and deep web in order to identify any type of human trafficking on a larger scale.

One of the driving forces behind this type of effort is the simple fact that computers can analyze ads soliciting sex at a much greater scale than human investigators can. However, the press release announcing the DARPA grant noted, “In addition to analyzing obvious clues, CMU experts in computer vision, language technologies and machine learning will develop new tools for such tasks as analyzing the authors of ads or extracting subtle information from images.”

Even prior to this project, Carnegie Mellon said researchers at the university were working on the issue of sex trafficking and developed programs that law-enforcement agencies have already used to make arrests. That’s a reassuring piece of information considering that much university research, even the stuff involving serious issues, has a hard time making its way into the hands of law enforcement or others who can act on it.

Although, human trafficking for sex or otherwise does seem to be an issue that’s bringing together all sorts of organizations with unique abilities to combat it. Aside from the work at Carnegie Mellon, Google is doing a lot of work to identify victims and their traffickers, via targeted search results as well as partnerships with the Polaris Project and Palantir. There’s also Thorn, a non-profit started by Ashton Kutcher and Demi Moore that uses various technologies to identify cases of child exploitation online.

Machine learning will eventually solve your JPEG problem

I take a lot of photos on my smartphone. So many, in fact, that my wife calls me Cellphone Ansel Adams. I can’t imagine how many more digital photos we’d have cluttering up our hard drives and cloud drives if I ever learned how to really use the DSLR.

So I get excited when I read and write about all the advances in computer vision, whether they’re the result of deep learning or some other technique, and all the photo-related acquisitions in that space (Google, Yahoo, Pinterest, Dropbox and Twitter have all bought computer vision startups). I’m well aware there are much wider-ranging and important implications, from better image-search online to disease detection — and we’ll discuss them all at our Structure Data conference in March — but I personally love being able to search through my photos by keyword even though I haven’t tagged them (we’ll probably discuss that at Structure Data, too).

A sample of the results when I search my Google+ photos for "lake."

A sample of the results when I search my Google+ photos for “lake.”

I love that Google+ can detect a good photo, or series of photos, and then spice it up with some Auto-Awesome.

IMG_20131226_121710-SNOW (1)

Depending on the service you use to manage photos, there has never been a better time to take too many of them.

If there’s one area that has lagged, though, it’s the creation of curated photo albums. Sometimes Google makes them for me and, although I like it in theory (especially for sharing an experience in a neatly packaged way), they’re usually not that good. It will be an album titled “Trip to New York and Jersey City,” for example, and will indeed include a handful of photos I took in New York, just usually not the ones I would have selected.

Although I’m not about to go through my thousands of photos (or even dozens of photos the day after a trip) and create albums, I’ll gladly let a service to do it for me. But it’s only if the albums are good that I’ll do something beyond glance at them. Usually, I love getting the alert that an album is ready, and then get over the excitement really quickly.

So I was interested to read a new study by Disney Research discussing how its researchers have developed an algorithm creates photo albums based on more factors than just time and geography, or even whether photos are “good.” The full paper goes into a lot more detail about how they trained the system (sorry, no deep learning) but this description from a press release about it sums up the results nicely:

[blockquote person=”” attribution=””]To create a computerized system capable of creating a compelling visual story, the researchers built a model that could create albums based on variety of photo features, including the presence or absence of faces and their spatial layout; overall scene textures and colors; and the esthetic quality of each image.

Their model also incorporated learned rules for how albums are assembled, such as preferences for certain types of photos to be placed at the beginning, in the middle and at the end of albums. An album about a Disney World visit, for instance, might begin with a family photo in front of Cinderella’s castle or with Mickey Mouse. Photos in the middle might pair a wide shot with a close-up, or vice versa. Exclusionary rules, such as avoiding the use the same type of photo more than once, were also learned and incorporated.[/blockquote]


It’s just research and surely isn’t perfect, but it feels like a step in the right direction. It could make sharing photos so much easier and more enjoyable for everyone involved. There’s no doubt the folks at Google, Yahoo and elsewhere are already working on similar things so they can roll them out across services such as Flickr and Google+.

Remember physical slide shows with projectors? The same rules still apply: Your aunt and your friends don’t want to skip through 5 pictures of your finger over the lens, marvel at the beauty of the same rock formation shot from 23 slightly different angles, or laugh at that at that sign that you had to be there to get why it’s funny. They want a handful of pictures of you looking nice in front of famous landmarks or pretty sunsets. Probably on their phone while waiting in line at the checkout.

I don’t always have the self-control or editorial sense to deliver that experience. I’ll be happy if an algorithm can do it for me.

AI is coming to IoT, and not all the brains will be in the cloud

Smart devices, appliances and the internet of things are dominating International CES this week, but we’re probably just getting a small taste of what’s to come — not only in quantity, but also in capabilities. As consumers get used to buying so-called smart devices, they’re eventually going to expect them to actually be smart. They might even expect them to be smart all the time.

So far, this type of expectation has been kind of problematic for devices and apps trying to perform feats of artificial intelligence such as computer vision. The current status quo of offloading processing to the cloud, which is the preferred method of web companies like Google, Microsoft and Baidu (for Android speech recognition, for example), works well enough computationally but can lag in terms of latency and reliability.

Running those types of algorithms locally hasn’t been too feasible historically because they’re often computationally intensive (especially in the training process), and low-power smartphone and embedded processors haven’t been up to the task. But times, they are a changing.


Take for example, the new mobile GPU, called the Tegra X1, that Nvidia announced over the weekend. Its teraflop of computing performance is impressive, but less so than what the company hopes it will be used for. It’s the foundation of the company’s new DRIVE PX automotive computer (pictured above), which Nvidia claims will allow cars to spot available parking spaces, park themselves and pick up drivers like a valet, and be able to distinguish between the various types of vehicles a car might encounter while on the road.

These capabilities “draw heavily on recent developments in computer vision and deep learning,” according to an Nvidia press release.

Indeed, Nvidia spotted a potential goldmine in the machine learning space a while ago as research teams began setting record after record in computer-vision competitions by training deep learning networks on GPU-powered systems. It has been releasing development kits and software libraries ever since to make it as easy as possible to embed its GPUs and program deep learning systems that can run on them.

There’s a decent-enough business selling GPUs to the webscale companies driving deep learning research (Baidu actually claims to have the biggest and best GPU-powered deep learning infrastructure around), but that’s nothing compared with the potential of being able to put a GPU in every car, smartphone and robot manufactured over the next decade.

drive px lights

Nvidia is not the only company hoping to capitalize on the smart-device gold rush. IBM has built a low-power neurosynaptic (i.e., modeled after the brain) chip called SyNAPSE that’s designed specifically for machine learning tasks such as object recognition and that consumes less than a tenth of a watt of power. Qualcomm has built a similar learning chip called Zeroth that it hopes to embed within the next generation of devices. (The folks responsible for building both will be speaking at our Structure Data conference this March in New York.)

A startup called TeraDeep says it’s working on deep learning algorithms that can run on traditional ARM and other mobile processor platforms. I’ve seen other demos of deep learning algorithms running on smartphones; one was created by Jetpac co-founder Pete Warden (whose company was acquired by Google in August) and the other was an early version of technology from stealth-mode startup called Perceptio (the founders of which Re/code profiled in this piece). TeraDeep, however, hopes to take things a step further by releasing a line of deep learning modules can be embedded directly into other connected devices, as well.

teradeep copy

Among the benefits that companies such as Google hope to derive from quantum computing is the ability to develop quantum machine learning algorithms that can run on mobile phones — and presumably other connected devices — while consuming very little power.

Don’t get me wrong, though: cloud computing will still play a big role for consumers as AI makes its way further into the internet of things. The cloud will still process data for applications that analyze aggregate user data, and it will still provide the computing brains for stuff that’s too small and cheap to justify any meaningful type of chip. But soon, it seems, we at least won’t have to do without all the smarts of our devices just because we’re without an internet connection.

A startup wants to quantify video content using computer vision

Computer vision has seen some major advances over the past couple of years, and a New York-based startup called Dextro wants to take the field to a new level by making it easier to quantify what the computers are seeing. Founded in 2012 by a pair of Ivy League graduates, the company is building an object-recognition platform that it says excels on busy images and lets users query their videos using an API a la other unstructured datasets.

The idea behind Dextro, according to co-founder David Luan, is to evolve computer vision services beyond tagging and into something more useful. He characterizes the difference between Dextro and most other computer vision startups (MetaMind, AlchemyAPI and Clarifai, for example) in terms of categorization versus statistics. Tagging photos automatically is great for image search and bringing order to stockpiles of unlabeled pictures, “but we found that most of the value and most of the interest … is when people know what they’re trying to get out of it,” he said.

Dextro has created an API that lets users query their images and, now, videos for specific categories of objects and receive results as JSON records. This way, they can analyze, visualize or otherwise use that data just like they might do with records containing usage metrics for websites or mobile apps. People might want to ask, for example, how many of their images contain certain objects, at what time within a video certain objects tend to appear, or what themes are the most present among their content libraries.

“You have a question about your data,” he said, “let’s help you answer it.”

I used Dextro's video demo to search a YouTube video (about installing a toilet) for toilets, beds and pistols.

I used Dextro’s video demo to search a YouTube video (about installing a toilet) for toilets, beds and pistols.

Aside from the ability to query image and video data, Dextro is trying to differentiate itself by training its vision models to detect objects and themes within chaotic scenes (not nicely focused, single-subject, or what Luan calls “iconic,” shots) and by analyzing videos as they are. “There’s so much information about your video that you lose by chopping it up into frames,” Luan said.

Turns out there really is a bed in it, too.

Turns out there really is a bed in it, too.

He’s quick to note is that although Dextro uses deep learning as part of its secret sauce, it’s not a deep learning company.

In fact, focusing on a narrow set of technologies or use cases is just the opposite of what he and co-founder Sanchit Arora hope the company will become. Luan already tried that in 2011 when he left Yale, accepted a Thiel Fellowship (he completed his bachelor’s degree at Yale in 2013), and took a first stab at the company as a computer vision and manipulation platform for robots. The name Dextro is a play on “dextrous manipulation.”

Although he and Arora both have lots of experience in robotics, Luan said the present incarnation of Dextro –which has raised $1.56 million in seed funding from a group of investors that includes Yale, Two Sigma Ventures and KBS+ Ventures — aims to be a general-purpose platform. Robots could eventually be a great form factor for the type of platform the company is building, but that market isn’t big enough just yet and there’s so much video being generated elsewhere.

David Luan (second from left) speaking at a Yale event.

David Luan (second from left) speaking at a Yale event.

And like most machine learning systems, the more that Dextro’s system sees, the smarter it gets. Luan thinks computer vision platforms will ultimately be a winner-take-all space, with the company analyzing the most and best content having the most-accurate models. “We want to power all the cameras and visual datasets out there,” he said.

That’s a lofty, and perhaps unrealistic, goal, but it’s indicative of the excitement surrounding the fields that companies like Dextro are playing in. One of the themes of our upcoming Structure Data conference is the convergence of artificial intelligence, robotics, analytics, and business that’s happening right now and changing how people think about their data. As computers get better at reading and analyzing data such as pictures, video and text, the onus falls on innovative users to figure out how take advantage of it.

Deep learning now tackling autism and matching monkeys’ vision

Two studies published this week provide even more evidence that deep learning models are very good at computer vision and might be able to tackle some difficult problems.

The study on computer vision, out of MIT and published in PLOS Computational Biology, shows that deep learning models can be as good as certain primates when it comes to recognizing images during a brief glance. The researchers even suggest that deep learning could help scientists better understand how primate vision systems work.

Figure 4.eps

Charts showing the relative performance of primates and deep learning models.

The genetic study, performed by a team of researchers from the Canadian Institute for Advanced Research and published in Science (available for a fee, but the University of Toronto has a relatively detailed article about the research), used deep learning to analyze the “code” involved in gene splicing. Focusing on mutated gene sequences in subjects with autism, the team was able to identify 39 additional genes that might be tied to autism spectrum disorder.

By now, the capabilities of deep learning in object recognition have been well established, and there is plenty of excitement among entrepreneurs and scientists about how it could apply in medicine. But these findings suggest that excitement has substance and the techniques can make meaningful impacts in areas have little or nothing to do with the web, from where many recent advances have emerged.

IBM bringing its skin-cancer computer vision system to hospitals

IBM says it has developed a machine learning system that identified images of skin cancer with better than 95 percent accuracy in experiments, and it’s now teaming up with doctors to see how it can help them do the same. On Wednesday, the company announced a partnership with Memorial Sloan Kettering — one of IBM’s early partners on its Watson system — to research the computer vision technology might be applied in medical settings.

According to one study, cited in the IBM infographic below, diagnostic accuracy for skin cancer today is estimated at between 75 percent and 84 percent even with computer assistance. If IBM’s research results hold up in the real world, they would constitute a significant improvement.

As noted above, the skin cancer research is not IBM’s first foray into applying machine learning and artificial intelligence techniques — which it prefers to call cognitive computing — in the health care setting. In fact, the company announced earlier this week a partnership with the Department of Veterans’ Affairs to investigate the utility of the IBM Watson system for analyzing medical records.

And [company]IBM[/company] is certainly not the first institution to think about how advances in computer vision could be used to diagnose disease. Two startups — Enlitic and Butterfly Network — recently launched with the goal of improving diagnostics using deep learning algorithms, and the application of machine learning to medical imagery has been, and continues to be, the subject of numerous academic studies.

We will be discussing the state of the art in machine learning, and computer vision specifically, at our Structure Data conference in March with speakers from IBM, Facebook, Yahoo, Stanford and Qualcomm, among others.