Large web sites like Facebook (s fb) are constantly under attack from hackers and groups trying to spread malware, which means sites like Facebook gather a lot of data about what attacks look like and where they’re coming from. In order to help standardize its methods for collecting and analyzing all this data, Facebook built a new framework called ThreatData, which it detailed in a blog post on Tuesday afternoon.
Essentially, though, ThreatData is composed of systems for ingesting and transforming data feeds from many different sources, storing and analyzing that data for historical and real-time trends (using the Hadoop-based Hive for the former and Scuba for the latter), and then reacting to threats in real time. Blog post author Mark Hammell, a threat researcher at Facebook, explained how ThreatData has been used for everything from detecting a campaign to spread smartphone malware via spam messages to creating a “super anti-virus” program that’s much more thorough than any commercial software.
The image below shows a graph Facebook developed using ThreatData to map malicious and victimized IP addresses, with the pie chart breaking that data down by ISP in the United States.
Things like ThreatData might be common among Facebook’s peers at the high end of the web, but companies in other fields might want to take note and try to build this type of framework themselves or find software vendors that can help put one in place. The key, at least according to Hammell, is building something that understands that data sources and formats will change, and that flexibility is key in both analyzing and acting upon that data.
There is a boatload of companies presently trying to employ big data and machine learning techniques to security. Maybe — hopefully — those approaches combined with a framework like ThreatData can actually help companies get a handle on the persistent cyber threats they’re facing.
Watching large enterprises like Target and Microsoft get pwned at massive scale, one loses confidence very quickly.