Fun with data: Graphing my music and getting textual with Edward Snowden

I am fascinated by data; by how much every one of us produces and how relatively easy it is to capture and analyze it. Even if the world never agrees on what a data scientist is, I think it can agree that I am not one, here I am able to play with data using nothing but a web browser, a spreadsheet and a text editor. What data piqued my interest lately? Music and Edward Snowden.

If you want to check out a graph of my music library and how Snowden’s choice of words compares with those of Gen. Keith Alexander, keep reading. And if you’re into something a little more substantive — like how companies capturing everything from text messages to sensor measurements from racecars and running them through artificial intelligence algorithms — come to Structure Data next week in New York.

Graphing my music

In a behavior quirk that belies my age, I still buy a lot of CDs. And when I buy them, I have to upload them to iTunes so I don’t actually have to ever see the CDs again. I tend to go to the as-far-as-I-can-tell only remaining record store in Las Vegas — Zia Records — periodically and buy multiple CDs at a time.

So, I thought I should visualize roughly when I went over the past few years. In order to do so, I copied my music library (which includes my wife’s and my daughter’s) and pasted it into an Excel sheet. Then I uploaded it to DataHero (whose co-founder and CEO Chris Neumann will be part of our Data Lab sessions next week) to identify any spikes in when songs were added to iTunes.

DataHero When I have uploaded songs to iTunes(3)

Then, I thought, why don’t I figure out when I added what albums. I used the network graph feature in Google Fusion Tables to do this, and while it’s a lot to take in, it does let me identify the larger nodes representing dates and see what all I added then, and presumably purchased shortly beforehand. (Hint: If you do this yourself, you probably want to format the “Date Added” cells to ignore the time of day.)

Filtering down to just the new year, I see Feb. 16 was a big day, when I added albums by Jim Croce, Neil Young, Neil Young & Crazy Horse, Life of Agony, Living Colour and the Sound City soundtrack. Also, Chris Cornell’s track from the Great Expectations soundtrack.
[protected-iframe id=”1ba85359375ac571545f15f2e5c17f8e-14960843-6578147″ info=”″ width=”700″ height=”400″ scrolling=”no”]

I also thought it might be interesting to see which artists’ albums we have the most of and which artists have appeared together on soundtracks or compilations, so I made another graph including artists and albums. (It’s actually a more classic and useful network graph in terms of visualizing relationships among things, but it renders kind of funky). Lots of Aerosmith, lots of Led Zeppelin, lots of Slayer, and if you filter in on the Less than Zero soundtrack you’ll see some of these artists all appearing in the same place.


I also thought it interesting to see the various compilation albums that, such as “Top 200 Classics — The Very Best of Classical Music” that includes lots of artists, or how many artists share the generic “Greatest Hits” naming convention.

greatest hits

Are Snowden and Gen. Alexander speaking the same language?

Finally, I took advantage of’s decision to post a transcript of Edward Snowden’s SXSW interview on Monday to do a little light text analysis. Rather than just look at Snowden, though, I thought I’d compare his use of language to that of his now-nemesis NSA Director Gen. Keith Alexander. So I grabbed a transcript of Gen. Alexander’s talk at Black Hat in July.

Using IBM ManyEyes, I was able to compare the two in one chart (which is hard to embed, so here’s the link to the online version (Java required)) and see what words each used most frequently and how often they used the same words. If you hover in the interactive version, it will bring up some context around each use. Not surprisingly, Gen. Alexander prefers to speak in terms of national security and the rule of law, while Snowden prefers to talk about the effects of government surveillance and how to stop it.

Click on the images for larger versions.


Single-word usage. Snowden is in blue.


Two-word phrases. Snowden is in blue.

Of course, the point of all of this isn’t to show what great insights I’ve discovered, but rather what’s possible if you have interest and a little time. Someone with better skills could certainly do some more compelling analysis of Snowden’s talk than what’s possible using the relatively simple tools in Many Eyes. Given enough samples, someone could create a bot that spoke with Snowden’s or Alexander’s idiosyncrasies. If I wanted to start recreating this Google Play Music tool on my own library, I’d go through and tag every song by genre.

For now, though, I’m just happy knowing it’s an option.