Piccolo Project Tries to Speed Past Hadoop

Few would argue that Hadoop doesn’t have a bright future as a foundational element of big data stacks, but Piccolo, a new project out of New York University, is moving data storage into machines’ memory in an attempt to improve parallel-processing performance beyond what Hadoop and/or MapReduce can do. Todd Hoff at High Scalability profiles the project, and I’d suggest going there for the details. At a high level, the difference between Hadoop and Piccolo might be explained as the difference between digging for one vegetable at a time and spreading the cleaning and peeling duties to a team of workers, versus having those workers each grab and process their own vegetables simultaneously, from a prearranged pile above the ground.

A more technical explanation is that Piccolo uses an in-memory, key-value store and a “global table interface” — as opposed to Hadoop, which utilizes a distributed file system contained within the disk drives of the machines in the cluster — that lets the CPUs access all the data simultaneously, and at high speeds only possible by pulling data straight from RAM. In this fairly long, but genuinely interesting presentation at the OSDI 10 conference, lead developer Russell Power explains how Piccolo works, how it differs from Hadoop and how it has tested far faster than Hadoop on certain workloads. Power compares Piccolo to the Chevy El Camino (s gm), which was both efficient and easy to use while also delivering high performance. Here’s a screenshot of Power illustrating Piccolo’s scalability on an Amazon EC2 (s amzn) cluster:

I’m not suggesting Piccolo is going to replace Hadoop, or MapReduce, generally, anytime soon or ever — Hadoop vendor Cloudera today received a strategic investment from U.S. intelligence sector consultant In-Q-Tel, which should hammer home the fact that Hadoop is for real — but Piccolo is worth watching. It certainly wouldn’t be the first academic project in recent memory to make it big; the Marten-Mickos-led cloud-software provider Eucalyptus Systems was a research project from the University of California Santa Barbara when it caught on, and then struck it big with VCs and early adopters.

To learn even more about the future of big data processing and analysis, make sure to attend out Structure Big Data conference March 23 in New York City. You won’t likely hear about the seedling Piccolo project, but you’ll hear plenty about cutting-edge use cases and tactics for the current generation of big data tools.

Related Content From GigaOM Pro (subscription required)