Survey: Hadoop clusters not that big, not changing the world (yet)

While Hadoop use is picking up among mainstream (read “non-web”) companies, it’s still far from the all-powerful and ubiquitous insight engine its supporters believe it will become, according to a recently released survey from Hadoop-focused startup Karmasphere. Karmasphere interviewed 376 “data professionals” to get its results.
Here are some statistics from the survey that illustrate the current state of affairs:

  • Only 52 percent of respondents “either have Hadoop in production or have a Hadoop cluster running.
  • Of those, 55 percent are running clusters between 1 and 10 terabytes in size. Not that size is everything, but 32 percent are running clusters of less than 2 TB — the size of the Time Machine (s aapl) hard drive sitting on my desk. By comparison, Facebook’s Hadoop cluster is more than 100 petabytes (uncompressed), which is 50,000 times larger than even a 2 TB cluster.
  • Sixty percent of respondents agree (28 percent of which “strongly agree”) that their data analysts lack the technical skills to analyze data on Hadoop. Despite this lack of skills, though, business users — arguably among the least-technically-savvy members — dominate respondents’ “big data teams.” The average composition is 60 business users, 21 systems administrators, 17 data analysts and 8 BI specialists.
  • Marketing departments still benefit most from Hadoop, according to 22 percent of respondents. This isn’t surprising at all given the whole market of big data products targeting marketers. Thankfully, for those who want to see Hadoop spread its wings into broader usage, 19 percent chose engineering and 14 percent (apiece) chose product management and operations as the departments making the most use of Hadoop.
  • Web logs (51 percent) and click streams (35 percent) are the types of data cited most often among the top data sources respondents analyze with Hadoop. Only 18 percent cited scientific data, and 14 percent cited sensor data.

Marketing is a great proving ground, but I would argue the real value of Hadoop is in finding new ways to solve problems of real consequence. These applications are popping up already, but surveys like this (and maybe it’s just the respondent pool) suggest they’re still a long way from being commonplace. However, as data science skills continue to spread (including the ability to work with larger datasets) and as Hadoop becomes both easier to use and more of a mean for building higher-level applications than the end itself, perhaps Hadoop-led revolutions in everything from science to engineering to global health will follow.
Feature image courtesy of Shutterstock user Four Oaks.