Rapleaf’s Web: How You Are Profiled on the Web

Earlier, I posted about San Francisco-based Internet information aggregator Rapleaf, a service that collects, sorts and repackages data about many of us who spend an inordinate amount of time on the Internet. I started poking around and discovered many startups that are using data from Rapleaf, but it’s not just startups. Just take a look at this article on Rapleaf in Fast Company from last year:

By accessing its database of 378,968,953 consumer email profiles, banks, retailers, and anti-fraud firms (all of which it counts among its clients) Rapleaf can quickly confirm legitimate customers and weed out scammers, cutting verification costs and improving the user experience. “Companies spend as much as $100 getting customers to their site. The goal is to filter out the bad people and keep as many good people as possible,” (Joel) Jewitt (Rapleaf’s VP of Business Development) says. “If a customer’s email address is attached to three or four social networking sites with 300 friends, the email likely isn’t fake and the retailer can put that person in the ‘good’ pile.”

One of our readers pointed out that because Rapleaf is sending data to these companies, which may be caching your information, there’s more information leaking out about you on the web. Opting out of Rapleaf’s service isn’t going to do you any good. Let’s put it bluntly: For better or worse, the genie is out of the bottle.

How Rapleaf Works

To better understand how, exactly, Rapleaf works, I did some investigating. On a basic level, Rapleaf is like a credit card company’s database. When you’re at a store and the cashier slides your credit card through, the store checks your card information against the credit card company’s database to make sure your card hasn’t expired and you have enough credit.

Rapleaf’s database contains email addresses. Say an airline offers a discount coupon, as long as you provide your email. When you sign up for the coupon, the airline looks up your email address in Rapleaf’s database; Rapleaf confirms the email is valid by checking it against your profile in its database; and the airline knows it can send you its email newsletter.

When I contacted Rapleaf, they said the company has built a database by crawling the web, looking for connections and building profiles based on their own technology. “Like Google, we crawl publicly available data on the web – as long as robots.txt allows search engines like us to crawl (we stop crawling if people disallow search engines),” CEO Auren Hoffman emailed. He added:

Rapleaf is working hard to protect consumers. We are a data company that, like 99 percent of data companies, is opt-out (rather than opt-in). But we are a white-hat data company who helps companies safely provide a more personalized experience to their customers. We try really hard to protect consumers (see) – we’ve thought a lot about consumer protection and are proud of everything we are doing. However, we are open to ideas on how we can improve and I encourage your readers to email me at [email protected] with ideas on how we can improve and better protect consumers. While we cannot commit to implementing any idea from your readers, we can commit that we will consider all thought-out suggestions.

The company argues what it does is no different from various ad networks, and that its policies are more consumer-friendly. You can opt out of Rapleaf by visiting this location, Hoffman said. Nevertheless, Rapleaf’s services are clearly much in demand, based on this response from CEO Hoffman:

Today we help hundreds of top retailers, hotels, advertising agencies, large brands, tech startups, educational organizations, and nonprofits personalize their customers’ experiences. (We sign NDAs with our customers so we cannot release their names.)

Think of Rapleaf as the provider of the FICO score about an email address. That email address comes with Facebook ID, Flickr ID (s yhoo), Twitter account information and other social details. For a marketer, or even someone trying to hit you up for business, this is pretty relevant data, for it allows them to target a customer and connect them socially. In another scenario, you can buy an email list of a million addresses for $1000, check them against Rapleaf and end up with about 10,000 emails worth targeting. That’s a pretty good deal.

A Good Email ID Is Worth Money

In order for Rapleaf to be successful, it needs to keep growing its database of good email addresses, which is why it’s giving startups like Facebook game and social CRM companies liberal access to its APIs. When a social CRM company, such as Rapportive, plugs into your Gmail (s goog) account, it confirms to Rapleaf that your email address is valid. Since the social CRMs create profiles of the people who email you, the services confirm to Rapleaf that your friends’ addresses are valid, too. Technically, no data is exchanged, but the sheer quantity of look-ups is enough to beef up Rapleaf’s database.

Think of it this way: Companies like Rapportive, by making simple queries, are becoming the sources of the best and highest quality emails/IDs that Rapleaf has ever obtained. I think this is the crux of the problem. Here’s a question I sent to Rapleaf and the answer I received (emphasis mine).

Does Rapportive (and others like them, such as Gist) pay for the service? If yes, how much? What happens to the queries that originate from Rapportive? Say email [email protected]. Does that data get stored in your databases?

Unfortunately we’re not able to go into details about specific relationships because of our confidentiality agreements, but all of our customers pay us for our service.  We do have a free API (up to 1000 queries per month) that many companies use — but companies need to pay for Rapleaf for queries above that. We only allow companies to learn more about their existing customers (and we have never given out email addresses) and when they query their customers’ email, we return the most updated information Rapleaf has associated with that email. If this is a new email we have not seen before, it may be cached to provide better user experience in the future or it can be removed via opt-out.

Given that Rapleaf’s core competency is its ability to take email addresses, map them with data on the web and build a profile, I find the argument that data is cached for better user experience hard to swallow. With nearly a billion email addresses in its database, any look-up helps Rapleaf cull out the best emails from the giant morass of addresses. There are at least two companies I spoke to who have declined to work with Rapleaf and refused its offer of free data, mostly because, in their opinion, they found the workflow unsavory, to put it mildly.

Rapleaf’s Startup Web

Regardless, here is a list of Internet startups that have access to data from Rapleaf. Clearly it is incomplete, and, for some of these companies, it is not clear if they send data back to Rapleaf (I’ve noted the companies that confirmed that they only look up data). I am going to update this post with more comments as I get them.

  • Rapportive. The CEO has confirmed that the company doesn’t pass any data back and forth.
  • eTacts. They say they are not passing information back to Rapleaf.
  • Gist. The CTO confirmed the company isn’t passing any information back to Rapleaf.
  • Flowtown. Co-founder Ethan Bloch left a comment indicating Flowtown doesn’t pass any information back to Rapleaf.
  • IntroMojo
  • SafetyWeb
  • SocialShield. Arad Rostampour denied passing any data back to Rapleaf.

As I said earlier, even if the companies aren’t passing any data, every time they do an email-based look-up against Rapleaf’s database, they are essentially helping make Rapleaf’s database more powerful.

Casting the Social Web

Verifying emails is one thing. But today, there is a lot more valid social information about demographics, interests, location, etc. available that a company like Rapleaf could use to fill out its profiles. I’m as concerned about startups using Rapleaf’s API as I am about how the company continues to mine data from huge data-rich social services such as LinkedIn. LinkedIn data is ending up on Rapleaf, and from there, it’s appearing on other services such as Flowtown. When I contacted LinkedIn, its spokesperson sent the following response:

As we’ve always said, our user data belongs to our users. It is provided by them and unless they have restricted it, is available on our site. We don’t share personally identifiable information with third parties without user consent. We also have teams that help protect our members’ professional profiles from scraping, spamming and any other activity that violates our terms of service. We don’t have any business relationship with Rapleaf.

However, LinkedIn data ends up at Rapleaf and, via Rapleaf, at other services through scraping of the publicly available data. Some people with knowledge of the subject believe that alternative tactics are being used to get around the API limitations of services such as LinkedIn. (If you know more, please get in touch with me.)

To be clear, I don’t have old-fashioned notions about privacy on the Internet. I know the realities of today’s Internet life. In order to enjoy the convenience of using web-based services, one has to make some sacrifices, and living socially online will eventually lead to an erosion of privacy. However, what I find egregious is how the information is surreptitiously collected all over the web, then aggregated to be sold, without us having any control or ability to look into that data. Sure we can opt out, but only if we know that we’re being profiled. (Ironically, you have to register to opt-out.)

I don’t want to blame only Rapleaf — ad networks are doing this as well, giving it cutesy names like behavioral targeting. U.S. Reps. Edward Markey (D-Mass.) and Joe Barton (R-Texas)  recently sent a letter to Mark Zuckerberg and Facebook, questioning him about privacy breaches at the social network. In August 2010, these same congressmen asked for information from various web services on cookies and how they use them. Maybe they should consider looking at these data-collectors as well. Perhaps they will come to the conclusion that this industry needs some kind of oversight.

Related content from GigaOM Pro (sub req’d):