For science, big data is the microscope of the 21st century

Johns Hopkins is taking a $1.2 million grant from the National Science Foundation to build a 100 gigabit per second network to shuttle data from the campus to other large computing centers at national labs and even Google (s goog). The network will be capable of transferring an amount of data equivalent to 80 million file cabinets filled with text each day.
The head of the project, Dr. Alex Szalay, detailed the plans, which include gear from networking gear from Cisco, (s csco) Arista and Solarflare;┬áNvidia GPUs; and 66,000 x86 cores. That’s on top of the actual fiber that will connect a new, 1-megawatt data center inside the physics building to regional Mid-Atlantic Crossroads research and engineering network at the University of Maryland.

The new data center at Johns Hopkins, awaiting its 100 Gbps backbone.

That connection will be the 100 Gbps element funded by the NSF, and the Mid-Atlantic Crossroads network connects out to Pittsburgh and then onto Chicago via other 100 Gbps networks that are growing in number across the country. Inside the campus, Szalay, who is the alumni centennial chair in physics and astronomy at Johns Hopkins, is setting up a 40 Gbps network between buildings that deal with lots of data such as the medical and computer science hubs. “To keep looking at big data sets we have to move the big data to a location where we can analyze it, and the stumbling block is [data sets of more than] 100 terabytes because of the speed of the network,” Szalay said.
He ascribes this massive amount of data to the emergence of cheap compute, better imaging and more information, and calls it a new way of doing science. “In every area of science we are generating a petabyte of data, and unless we have the equivalent of the 21st-century microscope, with faster networks and the corresponding computing, we are stuck,” Szalay said.
In his mind, the new way of using massive processing power to filter through petabytes of data is an entirely new type of computing which will lead to new advances in astronomy and physics, much like the microscope’s creation in the 17th century led to advances in biology and chemistry. When thought of in that light, the creation of 100 gigabit per second research network at Johns Hopkins becomes not just a fast network, but an essential tool for research and discovery, an essential component of the 21st-century microscope.
For example, he described trying to send a 150-terabyte chunk of astronomy data for analysis to Oak Ridge National Lab in Tennessee as “painful” because of the limits the 10 gigabit connection present between the university and the national lab. When he looks ahead 10 years and anticipates a colleague’s next-generation astronomy project currently underway that Google is supporting with 14 million compute hours, he believes it could generate 100 petabytes of data.
If that kind of data avalanche is a mere decade away, it appears our faster networks can’t come soon enough. It’s a good thing Johns Hopkins expects the 100-gigabit network out and the 40-gigabit intra-campus network will be functioning in April.
Image courtesy of Flickr user RinzeWind.