A green Hadoop could manage solar-powered data centers

The worlds of big data geeks and clean energy nerds have collided. Researchers at Rutgers University and Polytechnic University of Catalonia have proposed building “GreenHadoop,” a version of the programming framework MapReduce that could manage a data center’s computing workload to optimize clean energy from a solar system with grid as the backup power source (hat tip Green Broadband).

Such a system would determine when green energy would be available for Hadoop batch processing jobs that aren’t time sensitive, and then schedule them for that time frame. And if non-renewable energy has to be used because the job is urgent, or because the green energy is insufficient, GreenHadoop selects the time when non-renewable energy consumption would be the cheapest within the confines of the workload. Thus, the big data-crunching jobs could run all systems firing during peak solar times, and switch to lower levels of computing when the data center had to move over to be powered by less-clean grid power.

Such an experimental system, using solar panels to power the data center, would only delay non-priority computing and wouldn’t, say, lead to Facebook going down when a cloud goes in front of the sun. The system is similar to the idea of shifting workloads among a set of data centers throughout the world to find cheaper and cleaner energy.

The researchers say that a GreenHadoop-managed data center could “significantly increase green energy consumption and decrease electricity cost, compared to Hadoop.” The group published the paper at the EuroSys2012 conference.

The team at Rutgers are developing GreenHadoop to manage a small experimental solar-powered data center they built called Parasol (pictured). Parasol includes a container, a solar photovoltaic system and a battery system to store energy. The data center has three switches that can change how it is powered, switching it from running completely offgrid, to running in various states of using clean energy and grid energy.

Solar power is an emerging — and controversial — way to power data centers. Apple is building a 20 MW solar panel farm at its data center in Maiden, North Carolina, and Facebook and eBay have their own smaller solar systems, too. But because the sun is variable — it only shines during the day and when there are no clouds — it can’t provide 100 percent reliable data center power.