Cloudera and Hortonworks have been playing a game of oneupsmanship over the past few weeks in an attempt to prove whose contributions to the Apache Hadoop project matter most. Reputation matters to both companies, but maybe not as much as fending off encroachments to their turf.
Databases aren’t sexy. Except for possibly a brief moment in 2010 and perhaps a bit of 2011 when every reader of Hacker News was sharing his or her experience and every coder on GitHub wanted to know more. The NoSQL Tapes captures this moment.
Attention webscale aficionados, Twitter plans to open source its Hadoop-like real-time data processing tool known as Storm. The social service nabbed the code through its acquisition last month of BackType, and says it’s a better tool for processing streams of data.
Big data — as in managing and analyzing large volumes of information — has come a long way in the past couple of years. Among the greatest innovations might be the advent of real-time analytics, which allow the processing of information in real time to enable instantaneous decision-making.
The fight for Hadoop dominance is officially on. While Hortonworks is busy answering questions about its product strategy, Cloudera and MapR will demonstrate new versions of their distributions overflowing with bells and whistles. And there are several other competitive products lurking in the background.
Hadoop is a very valuable tool, but it’s far from perfect. While Apache, Cloudera, EMC, MapR and Yahoo focus on core architectural issues, there is a group of vendors trying to make Hadoop a more-fulfilling experience by focusing on business-level concerns such as applications and utilization.
At Google’s I/O event last month, the company announced new features and a new pricing model for its App Engine PaaS offering, and now the web giant thinks it’s prepared to compete with companies like Red Hat and Salesforce.com in bringing enterprise users to its platform.
Is Hadoop our only hope for solving big data challenges? From scalability to fault tolerance, Hadoop does myriad things very well. Yet, Hadoop is not the solution to all big data problems and use cases. Several key issues remain, including investment, complexity and batch-only processing.
Initially developed inside Yahoo! as a MapReduce-inspired tool for churning through Big Data, Hadoop was open sourced and continues to thrive within the Apache community as a key weapon in the data scientist’s toolkit. Well-funded startup Cloudera took the open source code (and key project contributors) and is building a business around helping enterprises to deploy and benefit from Big Data analysis. Today, Cloudera announced General Availability for CDH3; a fully open source distribution including the Hadoop Distributed File System (HDFS), Hadoop MapReduce, and a collection of tightly coupled companion tools designed to ensure that, “right out of the box, you can get useful work done on Hadoop.” Like other Big Data tools, Hadoop has tended to be rather rough around the edges; brilliant at churning through data in a particular way, but less polished when it came to interfacing with other systems or extending to cover a wider set of enterprise data analysis tasks. The work that Cloudera continues to do in packaging Hadoop’s power in a more accessible form has been important in making Big Data accessible to a broader audience. CDH3 takes this easy integration to a new level, whilst also raising the bar for Cloudera; the code is all open source and available to their competitors, forcing the company to continually differentiate itself on the service it offers rather than the code it controls.
High-performance computing leader Platform Computing hopes to capitalize on the big data movement by spreading its wings beyond its flagship business of managing clusters and grids and into managing MapReduce environments, too. Platform has a solid foundation among leading businesses, especially in the financial services industry.