The Defense Department’s data strategy: Huge, massive and distributed

Ely Kahn has spent more than a decade working in the national security world, including stints at the Transportation Safety Administration, Department of Homeland Security and the Executive Office of the President, where he was director of cybersecurity. In 2012, he joined up with a team of former National Security Agency engineers to create Sqrrl, a database company based on open-source Accumulo technology they created within the spy agency. Kahn came on the Structure Show podcast this week to talk about what the technology is really capable of, who’s using it and what’s in store when it comes to national security and technology.

Here are some highlights of an insightful interview that covers everything from the importance of baking security into Hadoop-based technologies (like Accumulo) to impending attacks on critical infrastructure. However, anyone interested in the whole story of how Accumulo works and how advanced analytic techniques can improve cybersecurity will want to hear the whole thing. They might also want to attend our Structure Data conference on March 19 and 20, where Booz Allen Hamilton’s Peter Guerra will be discussing the state of the art in using big data to combat cyber threats.

[soundcloud url=”″ params=”color=ff5500&auto_play=false&hide_related=false&show_artwork=true” width=”100%” height=”166″ iframe=”true” /]

Download This Episode

Subscribe in iTunes

The Structure Show RSS Feed

PRISM? Yeah, it’s that database

“Accumulo is at the centerpiece of NSA’s enterprise architecture. Most of NSA’s major analytical applications run on Accumulo,” Kahn said. “I won’t go in and state specifically each one, because I think that gets me into a slippery slope, but most of the ones that people have been reading about, those have a pretty good shot of having an Accumulo backend.”

Not only is it the centerpiece, but Accumulo might be just as capable as NSA critics assume it is. While it’s easy enough technologically to identify questionable behavior and target that, or to examine the networks of known suspects, the NSA has bigger ideas around what Kahn calls “patterns of life analysis”:

“This really boils down to anomaly detection, which is a big focus for us. How do you establish a pattern of what’s normal and then detect outliers from that baseline of normalcy? That can cross truly a huge set of use cases….

A lot of what we’re doing is around graph analytics now, and building huge, massive distributed graphs of data sets — building out what a normal graph of data will look like around a particular use case, and then looking for deviations from that normal pattern of behavior over time.”

For more on Accumulo and the NSA’s graph-analysis capabilities, check out our coverage from June, when the Edward Snowden story was first developing:

The whole Defense Department is getting in on the act

What works for the NSA will work across the entire Department of Defense, it hopes. Accumulo was part of an NSA mission to build a utility cloud computing and data infrastructure that could aggregate its resources agency-wide, and the Defense Department now wants to bring all of its data — from drone footage down to medical data — into a single analyzable system.

“There’s a major effort underway called the Joint Information Environment to really develop utility cloud and data cloud architectures across the entire Department of Defense for a truly massive set of use cases, ranging from cybersecurity to battlefield intelligence to even medical use cases,” Kahn explained.

Ely Kahn, co-founder and vice president of business development for Sqrrl.

Ely Kahn, co-founder and vice president of business development for Sqrrl.

Companies might not like the NSA, but they respect its tech

“Regardless of what people feel from a political perspective about NSA, I think people recognize that NSA is a leader in these big data technologies and a leader in security. And so in that sense, I’d say it’s a mark of approval having NSA legacy,” Kahn said in a response to a question about whether Sqrrl’s NSA roots have been a blessing or a curse. “Of course, I also go to conferences and I have conversations with folks from Pandora(s P) or Facebook(s fb) or consumer web app-type things, and folks at the ground level may have some questions about our history, but I think the decision makers see it as a good thing.”

How much do decision makers like the security aspects Sqrrl is pushing? “We’re installed in three of the Fortune 20 companies, five of the Fortune 50, and then dozens of others,” Kahn said. He later added, in reference to increased Accumulo support by Hadoop vendors Cloudera and Hortonworks, “I think what some of the big Hadoop vendors are seeing is that if they want to play in government, they need to support Accumulo.”

The state of cybersecurity: Scary as hell, but getting better

First, the good news, which has just come to fruition over the past few weeks:

“Via executive order…there has been a major effort by both the Department of Homeland Security and the National Institute for Standards and Technology to create a cybersecurity framework that can be utilized to raise that bar, at least initially, on a voluntary basis. So really for the first time now, there is a document that people can go to that says ‘here are the minimum standards that everyone should be utilizing in these critical infrastructure sectors.’ It may sound simple, but for an area as complex as cybersecurity, this is a major step forward.”

However, Kahn added, a “major step forward” is far from perfection: “[M]inimum adherence to a baseline is not sufficient.”

And, he noted, “There have been some pretty scary reports about foreign nations probing our electric grid that have been reported in the New York Times but, yes, nothing disastrous has happened yet. Personally, I think that’s probably a matter of time, but fingers crossed.”