Google announced on Wednesday that the company is open sourcing a MapReduce framework that will let users run native C and C++ code in their Hadoop environments. Depending on how much traction MapReduce for C, or MR4C, gets and by whom, it could turn out to be a pretty big deal.
Hadoop is famously, or infamously, written in Java and as such can suffer from performance issues compared with native C++ code. That’s why Google’s original MapReduce system was written in C++, as is the Quantcast File System, that company’s homegrown alternative for the Hadoop Distributed File System. And, as the blog post announcing MR4C notes, “many software companies that deal with large datasets have built proprietary systems to execute native code in MapReduce frameworks.”
MR4C was developed by satellite imagery company Skybox Imaging, which Google acquired last June, and was optimized for geospatial data and computer vision code libraries. Of course, open sourcing MR4c presents the opportunity to open up this capability to a broader range of users, either working in fields dominated by C libraries or those who just don’t like or aren’t comfortable writing programs in Java. When Google announced its open-source Kubernetes container-management system last year, it was quickly ported from Google Compute Engine to run in several other environments.
It will be interesting to see how much traction MR4C gets at this point, especially given the surge in interest around Apache Spark. Spark is a faster data-processing framework than MapReduce, already has a lot of interest, and natively supports Scala, Python and Java, although it does not support C/C++.
The future of Hadoop and big data processing will certainly be a big topic of conversation at our Structure Data conference next month in New York, which features Google VP of infrastructure Eric Brewer, Spark co-creator (and Databricks CEO) Ion Stoica and the CEOs of all three major Hadoop vendors.