Databricks, a new startup dedicated to commercializing the Apache Spark data-processing framework, has launched a “Certified on Spark” program for software vendors that want to tout their abilities to run on the increasingly popular technology. Spark was created as a processing framework for Hadoop that’s both faster and easier to use than the traditional MapReduce framework, and it’s catching on fast among folks writing big data applications.
Spark’s popularity is based on a few factors, including that it supports numerous programming languages (all of which are easier to write in than MapReduce) and supports faster data analysis both in-memory and on disk. It also allows for iterative queries on existing datasets, which — along with its speed — makes it more ideal for machine learning workloads. There are a number of workload-specific implementations on top of Spark, too, including Shark for interactive SQL queries, SparkR for statistical analysis and GraphX for graph processing.
As a result of all those strengths, the Spark community has grown fast, with a user base that includes Yahoo(s yhoo), Alibaba, Airbnb and ClearStory Data. The Spark project recently reached top-level status within the Apache Software Foundation. It also has a big corporate backer in Cloudera, which is shipping Spark in its forthcoming Hadoop distribution and is providing commercial support for Spark via a partnership with Databricks. Cloudera published a blog on Monday, co-written by Databricks Co-founder and CTO (and creator) Matei Zaharia, that demonstrates the differences between writing certain jobs in MapReduce versus Spark.
Databricks Co-founder and CEO Ion Stoica will be on stage at our Structure Data conference this week in New York, where we’ll honor the company as part of our Structure Data Awards program for startups. We’ll also have CEOs Tom Reilly of Cloudera and Rob Bearden of Hortonworks, who’ll discuss the future of Hadoop — a forecast in which frameworks such as Spark will no doubt play a key role.