Twitter is a great service, but it’s not exactly easy for users without programming skills to access their account data, much less do anything with it. Until now.
There already are services that will let you download reports about when you tweet and which of your tweets were the most popular, some — like SimplyMeasured and FollowerWonk — will even summarize data about your followers. If you’re willing to wait hours to days (Twitter’s API rate limits are just that — limiting) and play around with open source software, NodeXL will help you build your own social graph. (I started and gave up after realizing how long it would take if you have more than a handful of followers.) But you never really see the raw data, so you have to trust the services and you have to hope they present the information you want to see.
Then, last week, someone from ScraperWiki tweeted at me, noting that service can now gather raw data about users’ accounts. (I’ve used the service before to measure tweet activity.) I was intrigued. But I didn’t want to just see the data in a table, I wanted to do something more with it. Here’s what I did.
Step 1: ScraperWiki
This literally could not be easier. Get a ScraperWiki account, choose the “Create a new dataset” and select “Get Twitter followers.” At that point, enter your Twitter handle (or the handle of whatever Twitter user you choose) and hit “enter.” Depending on how many followers a person has, it could take minutes to many, many hours because of rate limits. I had just over 7,000 followers at the time and was done in minutes.
Once the process is complete, you actually search and sort a lot within ScraperWiki itself. In the table view, you can search by name, number of followers, location or even user ID (Om has by far the most followers among my followers, and appears to be Twitter user No. 989).
But unless you can write code or SQL queries, there’s not a whole lot more you can do, especially with Twitter data. So you’ll want to download the data as a spreadsheet.
Step 2: Clean the data
This one is kind of a pain, although I’m not sure it’s always necessary. I wanted to visualize my followers by where they live, so I felt I thought it was a good idea to standardize on a common value for the “location” column in the spreadsheet. Otherwise, you’d end up with, for example, a bunch of followers in “San Francisco,” some in “San Francisco, California,” others in “San Francisco, CA,” a surprising number in “SFO” — you get the point.
I opted for the postal service version for U.S. cities (e.g., San Francisco, CA) and City, Province, Country for Canadian cities, and City, Country for other international cities. Then there are the various ways that people enter their location by general geographic area. I standardized on San Francisco Bay Area, Bay Area and Silicon Valley, for example, when followers listed some variation of those as their locations, but I’ve since realized I should have combined them into one mega-value (probably either San Francisco Bay Area or Silicon Valley). NorCal, SoCal and any variations thereof became Northern California and Southern California.
Depending on what you want to do and what tools you’re using to do visualizations, you might need to go a step further or not go through this step at all. Tableau Public, for example, seemed to want separate columns for city and state when I tried to map my followers’ locations (latitude and longitude might have worked, as well). I didn’t know any slick Excel tricks for doing this in a hurry, so I just decided to use Google Fusion Tables, which automatically geocodes location data (after you manually label the column as containing location data).
Step 3: Pick a tool and visualize away
Once you have the data cleaned sufficiently (if at all), it’s time to visualize. The options are fairly limited if you don’t want to write code, but they’re still pretty powerful (and able to handle visualizations spanning thousands of rows, which was key). I went with the tried-and-true Tableau Public, IBM Many Eyes(s ibm) and Google Fusion Tables(s goog) (although Datahero and Infogram could work, as well, with fewer followers). They all work a little differently, but it’s nothing that most people couldn’t figure out within a few minutes of experimentation.
Here’s what I came up with.
From Google Fusion Tables
Zoomed on North America:
From IBM ManyEyes
WIth ManyEyes, I used the text from my followers’ descriptions rather than the location field. Here’s an unfiltered word cloud.
This time by two-word combination:
This is all far from big data or data science or any other tech-industry buzzwords, but it is good, clean fun with data. And, of course, because you have the raw data, there’s really no limit on how you can chart it or what metrics you can analyze against each other.