Report: No real substitute for NSA’s bulk data collection

A new report from the National Research Council concludes that untargeted, or bulk, data collection remains probably the best method for fulfilling the National Security Agency’s mission. However, the report’s authors note, there are measures that can be taken to make that practice more transparent and less susceptible to agency abuse.

The 80-plus-page report, which was written in response to a presidential directive to investigate alternatives to bulk data collection by the intelligence community, lays out various possible methods of targeting data collection to specific individuals, groups or behavior patterns –ranging from machine learning to real-time analysis — but seems to come down on the side of the status quo. The only way to investigate newly identified suspects or new information is to have a large database that might already include relevant data, the report explained.

The report does offer suggestions for improving public trust in the process, primarily focused on improving the sanctity of the bulk datastore and how that data is accessed. It suggests a combination of automated and manual procedures for regulating who can access what data, what types of queries can be run in what situations, and auditing all database-query activity.


The NSA already has numerous controls in place to address these concerns but, the report notes, it can be difficult to convince skeptics of their effectiveness because of the secrecy of the agency’s operations. Indeed, I’ve has reported on some of those controls before, including limitations on what types of queries and collection are allowed, and the cell-level security of the NSA’s home-grown Accumulo database system.

In addition to improving on existing protocols, the report suggests that new avenues of research for automating the data privacy of U.S. citizens might include advanced encryption techniques, and how to enable lawyers or other non-technical personnel to program policies that govern data usage. In May, we covered some Microsoft research into the latter possibility.

One alternative to bulk collection that the report suggests is to rely on businesses to store customer data and supply it as needed. This way the NSA isn’t technically collecting or storing data, which could mitigate citizen fears over mass government surveillance. Of course, the report notes, those companies might have strong incentives not to comply with the governments — something we’ve already seen from certain companies following accusations that they were in cahoots with the NSA.


However, the report does acknowledge some fundamental flaws with any attempts to improve NSA protocol. A particularly troubling one is the varying terminology different analysts, agencies and the FISC court use to describe the same things — a situation that led to NSA analysts accessing domestic telephone metadata “for several years in some instances.” Another tricky consideration is how to regulate data usage once it has been disseminated to other agencies and ported onto commercial operating systems and data-analysis software, and mixed with other data sources.

There’s also the fact that the report’s authors were only presented with three unclassified use cases to analyze for possible alternative to bulk data collection. “[I]t was told this that this is not a complete set, so its search for collection alternatives was limited,” the report states.

The report expressly avoids taking any position on whether the NSA’s data practices are sound public policy, and it avoids discussion of how some data is collected in the first place. Especially in the case of web data, where documents released by Edward Snowden show the agency essentially hacking into corporate systems and networks, it’s the methods of collection that really have some people upset.

The Washington Post's slide purporting to show NSA hacking.

The Washington Post’s slide purporting to show NSA hacking.