Systems to handle big data might be this generation’s moon landing

An effort to build a radio telescope that can see back 13 billion years to the creation of the universe is prompting a five-year €32 million ($42.7 million) effort to create a low-power supercomputer and networks to handle the data the new radio telescope will generate. The DOME project, named for a mountain in Switzerland and the covering of a telescope, is the joint effort between IBM (s ibm) and the Dutch space agency ASTRON to build such a network and computer.
There are three problems with building a telescope capable of reading radio waves from that far out in deep space (actually there’s a real estate problem too, because the array will require millions antennas spread over an area the width of the continental U.S., but we’ll stick to computing problems). The first problem is the data that this Square Kilometre Array (SKA) will generate. IBM estimates it will produce:

… a few Exabytes of data per day for a single beam per one square kilometer. After processing this data the expectation is that per year between 300 and 1,500 Petabytes of data need to be stored. In comparison, the approximately 15 Petabytes produced by the Large Hadron Collider at CERN per year of operation is approximately 10 to 100 times less than the envisioned capacity of SKA.

And guys, the LHC is in the midst of getting its own cloud computing infrastructure in order to handle its data. So this IBM/ASTRON project may be just the beginning for SKA. As I say in the headline, in many ways, projects like the LHC and the SKA are ambitious investigations into the origins and composition of the universe. Our investigations into dark matter will require a compute effort that could rival the engineering effort that it took to get men on the moon. Which makes big data our Sputnik and our Apollo 11.
Now, back to the problems associated with the telescope. It will generate data like a corpse breeds maggots, so the project needs a computer big enough to process it without requiring a power plant or two. Additionally that data might have to travel from the antenna arrays to the computer, which means the third problem is the network. I’ve covered the need for compute and networks to handle our scientific data before in a story on Johns Hopkins’ new 100 gigabit on-campus network, but the scale of the DOME project dwarfs anything Johns Hopkins is currently working on. From that story:

[Dr. Alex Szalay of Johns Hopkins] ascribes this massive amount of data to the emergence of cheap compute, better imaging and more information, and calls it a new way of doing science. “In every area of science we are generating a petabyte of data, and unless we have the equivalent of the 21st-century microscope, with faster networks and the corresponding computing, we are stuck,” Szalay said.
In his mind, the new way of using massive processing power to filter through petabytes of data is an entirely new type of computing which will lead to new advances in astronomy and physics, much like the microscope’s creation in the 17th century led to advances in biology and chemistry.

So we need the computing and networking equivalent of a microscope to enable us to deal with a telescope planned for 2024, and the time to start building it is now. That gives us a lot longer than the time frame we had to land on the moon. IBM views the problem as one worthy of the following infographic:

As the infographic shows, we’re going to need massively multicore, low-power computers, better interconnection using photonics and new ways of building our networks. Hopefully, the search for dark matter is worth it.
SKA image courtesy of the Square Kilometer Array.