Microsoft is building fast, low-power neural networks with FPGAs

Microsoft on Monday released a white paper explaining a current effort to run convolutional neural networks — the deep learning technique responsible for record-setting computer vision algorithms — on FPGAs rather than GPUs.

Microsoft claims that new FPGA designs provide greatly improved processing speed over earlier versions while consuming a fraction of the power of GPUs. This type of work could represent a big shift in deep learning if it catches on, because for the past few years the field has been largely centered around GPUs as the computing architecture of choice.

If there’s a major caveat to Microsoft’s efforts, it might have to do with performance. While Microsoft’s research shows FPGAs consuming about one-tenth the power of high-end GPUs (25W compared with 235W), GPUs still process images at a much higher rate. Nvidia’s Tesla K40 GPU can do between 500 and 824 images per second on one popular benchmark dataset, the white paper claims, while Microsoft predicts its preferred FPGA chip — the Altera Arria 10 — will be able to process about 233 images per second on the same dataset.

However, the paper’s authors note that performance per processor is relative because a multi-FPGA cluster could match a single GPU while still consuming much less power: “In the future, we anticipate further significant gains when mapping our design to newer FPGAs . . . and when combining a large number of FPGAs together to parallelize both evaluation and training.”

In a Microsoft Research blog post, processor architect Doug Burger wrote, “We expect great performance and efficiency gains from scaling our [convolutional neural network] engine to Arria 10, conservatively estimated at a throughput increase of 70% with comparable energy used.”


This is not Microsoft’s first rodeo when it comes deploying FPGAs within its data centers, and in fact is a corollary of an earlier project. Last summer, the company detailed a research project called Catapult in which it was able to improve the speed and performance of Bing’s search-ranking algorithms by adding FPGA co-processors to each server in a rack. The company intends to port production Bing workloads onto the Catapult architecture later this year.

There have also been other attempts to port deep learning algorithms onto FPGAs, including one by State University of New York at Stony Brook professors and another by Chinese search giant Baidu. Ironically, Baidu Chief Scientist, and deep learning expert, Andrew Ng is big proponent of GPUs, and the company claims a massive GPU-based deep learning system as well as a GPU-based supercomputer designed for computer vision. But this needn’t be and either/or situation: companies could still use GPUs to maximize performance while training their models, and then port them to FPGAs for production workloads.

Expect to hear more about the future of deep learning architectures and applications at Gigaom’s Structure Data conference March 18 and 19 in New York, which features experts from Facebook, Microsoft and elsewhere. Our Structure Intelligence conference, September 22-23 in San Francisco, will dive even deeper into deep learnings, as well as the broader field of artificial intelligence algorithms and applications.

Microsoft says its new computer vision system can outperform humans

Microsoft researchers claim in a recently published paper that they have developed the first computer system capable of outperforming humans on a popular benchmark. While it’s estimated that humans can classify images in the ImageNet dataset with an error rate of 5.1 percent, Microsoft’s team said its deep-learning-based system achieved an error rate of only 4.94 percent.

Their paper was published less than a month after Baidu published a paper touting its record-setting system, which it claimed achieved an error rate of 5.98 percent using a homemade supercomputing architecture. The best performance in the actual ImageNet competition so far belongs to a team of Google researchers, who in the 2014 built a deep learning system with a 6.66 percent error rate.

A set of images that the Microsoft system classified correctly.

A set of images that the Microsoft system classified correctly. “GT” means ground truth; below are the top five predictions of the deep learning system.

“To our knowledge, our result is the first published instance of surpassing humans on this visual recognition challenge,” the paper states. “On the negative side, our algorithm still makes mistakes in cases that are not difficult for humans, especially for those requiring context understanding or high-level knowledge…

“While our algorithm produces a superior result on this particular dataset, this does not indicate that machine vision outperforms human vision on object recognition in general . . . Nevertheless, we believe our results show the tremendous potential of machine algorithms to match human-level performance for many visual recognition tasks.”

A set of images where the deep learning system didn't match the given label, although it did correctly classify objects in the scene.

A set of images where the deep learning system didn’t match the given label, although it did correctly classify objects in the scene.

One of the Microsoft researchers, Jian Sun, explains the difference in plainer English in a Microsoft blog post: “Humans have no trouble distinguishing between a sheep and a cow. But computers are not perfect with these simple tasks. However, when it comes to distinguishing between different breeds of sheep, this is where computers outperform humans. The computer can be trained to look at the detail, texture, shape and context of the image and see distinctions that can’t be observed by humans.”

If you’re interested in learning how deep learning works, why it’s such a hot area right now and how it’s being applied commercially, think about attending our Structure Data conference, which takes place March 18 and 19 in New York. Speakers include deep learning and machine learning experts from Facebook, Yahoo, Microsoft, Spotify, Hampton Creek, Stanford and NASA, as well as startups Blue River Technology, Enlitic, MetaMind and TeraDeep.

We’ll dive even deeper into artificial intelligence at our Structure Intelligence conference (Sept. 22 and 23 in San Francisco), where early confirmed speakers come from Baidu, Microsoft, Numenta and NASA.

Google algorithm guy, Microsoft DSP expert get NAE nod

Two industry luminaries — Google Fellow Amit Singhal and Microsoft Distinguished Engineer Henrique “Rico” Malvar — were elected to the The National Academy of Engineering this week. Singhal oversees Google’s search algorithms and Malvar specializes in signal processing and compression.