Microsoft is building fast, low-power neural networks with FPGAs

Microsoft on Monday released a white paper explaining a current effort to run convolutional neural networks — the deep learning technique responsible for record-setting computer vision algorithms — on FPGAs rather than GPUs.

Microsoft claims that new FPGA designs provide greatly improved processing speed over earlier versions while consuming a fraction of the power of GPUs. This type of work could represent a big shift in deep learning if it catches on, because for the past few years the field has been largely centered around GPUs as the computing architecture of choice.

If there’s a major caveat to Microsoft’s efforts, it might have to do with performance. While Microsoft’s research shows FPGAs consuming about one-tenth the power of high-end GPUs (25W compared with 235W), GPUs still process images at a much higher rate. Nvidia’s Tesla K40 GPU can do between 500 and 824 images per second on one popular benchmark dataset, the white paper claims, while Microsoft predicts its preferred FPGA chip — the Altera Arria 10 — will be able to process about 233 images per second on the same dataset.

However, the paper’s authors note that performance per processor is relative because a multi-FPGA cluster could match a single GPU while still consuming much less power: “In the future, we anticipate further significant gains when mapping our design to newer FPGAs . . . and when combining a large number of FPGAs together to parallelize both evaluation and training.”

In a Microsoft Research blog post, processor architect Doug Burger wrote, “We expect great performance and efficiency gains from scaling our [convolutional neural network] engine to Arria 10, conservatively estimated at a throughput increase of 70% with comparable energy used.”


This is not Microsoft’s first rodeo when it comes deploying FPGAs within its data centers, and in fact is a corollary of an earlier project. Last summer, the company detailed a research project called Catapult in which it was able to improve the speed and performance of Bing’s search-ranking algorithms by adding FPGA co-processors to each server in a rack. The company intends to port production Bing workloads onto the Catapult architecture later this year.

There have also been other attempts to port deep learning algorithms onto FPGAs, including one by State University of New York at Stony Brook professors and another by Chinese search giant Baidu. Ironically, Baidu Chief Scientist, and deep learning expert, Andrew Ng is big proponent of GPUs, and the company claims a massive GPU-based deep learning system as well as a GPU-based supercomputer designed for computer vision. But this needn’t be and either/or situation: companies could still use GPUs to maximize performance while training their models, and then port them to FPGAs for production workloads.

Expect to hear more about the future of deep learning architectures and applications at Gigaom’s Structure Data conference March 18 and 19 in New York, which features experts from Facebook, Microsoft and elsewhere. Our Structure Intelligence conference, September 22-23 in San Francisco, will dive even deeper into deep learnings, as well as the broader field of artificial intelligence algorithms and applications.

PhotoTime is a deep learning application for the rest of us

A Sunnyvale, California, startup called Orbeus has developed what could be the best application yet for letting everyday consumers benefit from advances in deep learning. It’s called PhotoTime and, yes, it’s yet another photo-tagging app. But it looks really promising and, more importantly, it isn’t focused on business uses like so many other recent deep-learning-based services, nor has it been acquired and dissolved into Dropbox or Twitter or Pinterest or Yahoo.

Deep learning, to anyone unfamiliar with the term, is essentially a term for a class of artificial intelligence algorithms that excel at learning the latent features of the data they analyze. The more data that deep learning systems have to train on, the better they perform. The field has made big strides in recent years, largely with regard to machine-perception workloads such as computer vision, speech recognition and language understanding.

(If you want to get a crash course in what deep learning is and why web companies are investing billion of dollars into it, come to Structure Data in March and watch my interview with Rob Fergus of Facebook Artificial Intelligence Research, as well as several other sessions.)

The Orbeus team. L to R: TK, Yi Li, Wei Xia and Meng Wang.

The Orbeus team. L to R: Yuxin Wu, Yi Li, Wei Xia and Meng Wang.

I am admittedly late to the game in writing about PhotoTime (it was released in November) because, well, I don’t often write about mobile apps. The people who follow this space for a living, though, also seemed impressed with it when they reviewed it back then. Orbeus, the company behind PhotoTime, launched in 2012 and its first product is a computer vision API called ReKognition. According to CEO Yi Li, it has already raised nearly $5 million in venture capital.

But I ran into the Orbeus team at a recent deep learning conference and was impressed with what they were demonstrating. As an app for tagging and searching photos, it appears very rich. It tags smartphone photos using dozens of different categories, including place, date, object and scene. It also recognizes faces — either by connecting to your social networks and matching contacts with people in the photos, or by building collections of photos including the same face and letting users label them manually.

You might search your smartphone, for example, for pictures of flowers you snapped in San Diego, or for pictures of John Smith at a wedding in Las Vegas in October 2013. I can’t vouch for its accuracy personally because the PhotoTime app for Android isn’t yet available, but I’ll give it the benefit of the doubt.


More impressive than the tagging features, though — and the thing that could really set it apart from other deep-learning-powered photo-tagging applications, including well-heeled ones such as Google+, Facebook and Flickr — is that PhotoTime actually indexes the album locally on users’ phones. Images are sent to the cloud, ran through Orbeus’s deep learning models, and then the metadata is sent back to your phone so you can search existing photos even without a network connection.

The company does have a fair amount of experience in the deep learning field, with several members, including research scientist Wei Xia, winning a couple categories at last year’s ImageNet object-recognition competition as part of a team from the National University of Singapore. Xia told me that while PhotoTime’s application servers run largely on Amazon Web Services, the company’s deep learning system resides on a homemade, liquid-cooled GPU cluster in the company’s headquarters.

Here’s what that looks like.

The Orbeus GPU cluster.

The Orbeus GPU cluster.

As I’ve written before, though, tagging photos is only part of the ideal photo-app experience, and there’s still work to do there no matter how nice the product functions. I’m still waiting for some photo application to perfect the curated photo album, something Disney Research is working on using another machine learning approach.

And while accuracy continues to improve for recognizing objects and faces, researchers are already hard at work applying deep learning to everything from recognizing the positions of our bodies to the sentiment implied by our photos.

Facebook open sources tools for bigger, faster deep learning models

Facebook on Friday open sourced a handful of software libraries that it claims will help users build bigger, faster deep learning models than existing tools allow.

The libraries, which [company]Facebook[/company] is calling modules, are alternatives for the default ones in a popular machine learning development environment called Torch, and are optimized to run on [company]Nvidia[/company] graphics processing units. Among the modules are those designed to rapidly speed up training for large computer vision systems (nearly 24 times, in some cases), to train systems on potentially millions of different classes (e.g., predicting whether a word will appear across a large number of documents, or whether a picture was taken in any city anywhere), and an optimized method for building language models and word embeddings (e.g., knowing how different words are related to each other).

“‘[T]here is no way you can use anything existing” to achieve some of these results, said Soumith Chintala, an engineer with Facebook Artificial Intelligence Research.

That team was formed in December 2013 when Facebook hired prominent New York University researcher Yann LeCun to run it. Rob Fergus, one of LeCun’s NYU colleagues who also joined Facebook at the same time, will be speaking on March 19 at our Structure Data conference in New York.

A heatmap showing performance of Facebook's modules to standard ones on datasets of various sizes. The darker the green, the faster Facebook was.

A heatmap showing performance of Facebook’s modules to standard ones on datasets of various sizes. The darker the green, the faster Facebook was.

Despite the sometimes significant improvements in speed and scale, however, the new Facebook modules probably are “not going to be super impactful in terms of today’s use cases,” Chintala said. While they might produce noticeable improvements within most companies’ or research teams’ deep learning environments, he explained, they’ll really make a difference (and justify making the switch) when more folks are working on stuff at a scale like Facebook is now — “using models that people [previously] thought were not possible.”

Perhaps the bigger and more important picture now, then, is that Friday’s open source releases represent the start of a broader Facebook effort to open up its deep learning research the way it has opened up its work on webscale software and data centers. “We are actually going to start building things in the open,” Chintala said, releasing a steady stream of code instead of just the occasional big breakthrough.

Facebook is also working fairly closely with Nvidia to rework some of its deep learning programming libraries to work at web scale, he added. Although it’s working at a scale beyond many mainstream deep learning efforts and its researchers change directions faster than would be feasible for a commercial vendor, Facebook’s advances could find their way into future releases of Nvidia’s libraries.

Given the excitement around deep learning right now — for everything from photo albums to self-driving cars — it’s a big deal that more and better open source code is becoming available. Facebook joins projects such as Torch (which it uses), Caffe and the Deeplearning4j framework being pushed by startup Skymind. Google has also been active in releasing certain tooling and datasets ideal for training models.

It was open source software that helped make general big data platforms, using software such as Hadoop and Kafka, a reality outside of cutting-edge web companies. Open source might help the same thing happen with deep learning, too — scaling it beyond the advances of leading labs at Facebook, Google, Baidu and Microsoft.

Baidu is trying to speed up image search using FPGAs

Chinese search engine Baidu is trying to speed the performance of its deep learning models for image search using field programmable gate arrays, or FPGAs, made by Altera. Baidu has been experimenting with FPGAs for a while (including with Altera rival Xilinx’s gear) as a means of boosting performance on its convolutional neural networks without having to go whole hog down the GPU route. FPGAs are likely most applicable in production data centers where they can be paired with existing CPUs to serve queries, while GPUs can still power much behind-the-scenes training of deep learning models.

Baidu says its massive deep-learning system is nearly complete

Baidu says its 100-billion-neuron deep learning system will be complete within six months, powering a fast transition away from text as the dominant search input. Thanks to smartphones and its new Baidu Eye technology, the company expects voice and image search to dominate within five years.

More deep learning for the masses, courtesy of Ersatz Labs

A startup called Ersatz Labs is promising deep learning, delivered via appliance or the cloud, usable by pretty much anybody already familiar with machine learning. It’s the latest attempt to take the data-processing approach out of the lab and into the mainstream.