Can Kaggle make data science a spectator sport?

Updated: Don’t worry if you don’t yet have a favorite data scientist, I don’t either. But maybe that’s just because we haven’t known who to root for.

Kaggle hopes to change that with a twist on its predictive-modeling competition platform that makes public the competitors in invite-only private competitions. Think of it like watching a major tournament in golf or tennis, where you can watch the best in the world shoot it out to see whose algorithms are king. Kaggle’s tagline is “We’re making data science a sport.” Maybe now it can make data science a spectator sport.

Top five in the GigaOM/Splunk competition.

Actually, Kaggle has been running private competitions — in which customers’ generally remain anonymous and keep their challenge descriptions vague except for those invited to work on the data — since about the beginning of 2012. In the past, though, even the competitors remained a mystery. It also posts leaderboards for all public competitions and a cumulative leaderboard. But as any sports fan knows, there’s nothing quite like watching a tournament where only only the best of the best can play, and where the pressure is on.

Now, says Kaggle Founder and CEO Anthony Goldbloom, private competitions are more like running the U.S. Open in that others can watch the leaderboard and see how the invited data scientists are faring. It’s primarily a feature so other data scientists on the Kaggle platform can gauge their relative performance and get a little more motivation to step up their game and make it to the invitation-only competitions, but I think it could become geek spectator sport under the right conditions.

If you’re wondering which U.S. Open he’s talking about (golf or tennis), don’t fret — had Goldbloom been asked whether Kaggle is more like golf or tennis before it launched, even he might have guessed wrong. He’d probably have guessed tennis, in which certain players excel on certain types of courts, like Roger Federer on the grass court at Wimbledon, or Rafael Nadal on the clay court at the French Open. So, someone who works in biotech might naturally prevail in those competitions, while a natural-language processing specialist might do best in competitions with lots of text to mine.

It turns out Kaggle is more like golf, in which a dominant player like Tiger Woods can win on pretty much any course he plays. Newcomers can still win, especially because there are plenty of good data scientists still making their way to the nascent Kaggle platform, but, Goldbloom says, the really good ones will adapt their skill sets to whatever is necessary for any given competition.

The first private competition open to public viewership began on Wednesday, and is somewhat unique in that the sponsor is willing to share its name and its challenge. It’s insurance provider Allstate, and it’s trying to predict customer churn. According to Goldbloom, the prohibitive favorite is Jason Tigg, an Oxford physicist turned hedge fund manager, but Indianapolis actuary Shea Parkes and apparent mystery man Jonathan Peters are names to watch.

There you have it, sports fans. Place your bets accordingly.

Note: This story was updated to reflect that Oxford physicist Jason Tigg decided not to take part in the first private competition open to public viewership.

Feature image courtesy of Shutterstock user photofriday.