Facebook Data Deleted After Lawsuit Threat

Updated: A researcher who collected data from more than 210 million public Facebook profiles and used it to create a rich picture of connections among users of the social network has deleted the entire database after being threatened with a lawsuit by the company. Pete Warden, who says he had expressions of interest from more than 50 scientists who wanted to use the information in their research, writes in a blog post that he was asked by the company to destroy it because he didn’t ask the site’s permission to harvest it — and that since he doesn’t have the funds to contest a lawsuit, he complied. He writes:

As you can imagine I’m not very happy about this, especially since nobody ever alleged that my data gathering was outside the rules the web has operated by since crawlers existed. I followed their robots.txt directions, and was even helped by microformatting in the public profile pages. Literally hundreds of commercial search engines have followed the same path and have the same data. You can even pull identical information from Google’s cache if you don’t want to hit Facebook’s servers. So why am I destroying the data? This area has never been litigated and I don’t have enough money to be a test case.

Warden used the data in a variety of ways, including creating visualizations of the different connections among users of the social network both in the U.S. and in countries around the world. We highlighted some of that research in this post, which showed how Warden’s analysis had come up with seven distinct segments of the United States when it came to being connected with others through Facebook, including areas he described with colorful names such as “Stayathomia.” He also put together a site that allowed users to sort the data by different cities and countries and see the connections among them (the site still appears to be functioning).

Warden says he complied with the requirements in the robots.txt file, which Facebook (and most other major sites) use to restrict crawlers and bots from harvesting certain information, but Facebook told New Scientist that the researcher breached the site’s terms of use. The threat may not stop the kind of research that Warden was engaged in for long, however. In his blog post, he points out to “the researchers that I’ve disappointed” that there are a number of ways to harvest similar data from other sources, which he described in a separate blog post, including the ability to collect a large dataset from public information on Google Profiles.

Update: Andrew Noyes, manager of public policy communications at Facebook, said in an email that Warden “aggregated a large amount of data from over 200 million users without our permission, in violation of our terms. He also publicly stated he intended to make that raw data freely available to others.” Noyes also noted that Facebook’s statement of rights and responsibilites says that users agree not to collect users’ content or information “using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.”

Related content from GigaOM Pro (sub req’d):

Why New Net Companies Must Shoulder More Responsibility