Notice to startups: You are doing data science wrong

When we think about data science, we think of a mythical laboratory where scientists are feverishly crunching numbers to provide a clear quantitative view of the future. We’ve been sold on the idea that these transformative findings will unlock increased performance numbers. Large companies hire teams of academics to sift through their massive repositories of data, believing in data science as it is currently described: academics doing glorified business intelligence.

Startups and small businesses must take a drastically different approach if they want to see real value.

For startups, data science should not be seen as a separate scientific initiative but as an integrated part of the product. Speed and efficiency are key factors to burgeoning companies; hiring and building out a team of data scientists, or more aptly named “data product engineers,” is paramount. Once you accept that data science is about building data products, you will see that your data engineers, contrary to popular belief, do not need PhDs. Instead, they need to be able to integrate into the core of your product and engineering organization.

Approaching data science from the product lens in not a completely new idea. DJ Patil, who previously led data products at LinkedIn and is currently VP of product at RelateIQ, discussed this in his book Data Jujitsu: The Art of Turning Data into Product. His thesis runs along the same lines as ours: product-focused data science is different than the current business intelligence style of data science. BI initiatives are well understood, but integrating data into the heart of your product in real-time is not.

We integrate data into our core product offering, a native advertising exchange. We combine terabytes of publisher content data with user interaction data to understand the context of a user on a publisher’s page, allowing us to deliver the most contextually relevant content.

For an example of a data-driven product outside of the advertising industry, let’s take a look at Pandora. Pandora consumes song metadata and combines it with user listening history to create customized radio stations for its users. This is essentially a gigantic recommendation system, which is an inherent data science problem. It is hard to imagine Pandora’s core product without data-driven customized radio stations, which require the tight data integration we have been discussing.

How do companies build out data product teams that are nimble enough to create such products? We approach the entire data science hiring paradigm differently.

Most current (traditional) job postings are looking for individuals with Ph.D. and industrial research experience, but most startups don’t need bleeding-edge machine learning to drive substantial business success. Therefore, advanced academic credentials are not the right criteria to look for when hiring someone to build great data products for your startup. Someone who only feels comfortable writing non-production code in Hive, SQL, R and Matlab can’t build great data products. This creates a data science organization that works in a dark corner and throws algorithms over the fence hoping the engineers can implement them in the product.

What you really need is someone who understands how to take data and transform it into a product.

So who should you look for? You need data engineers with the skills to build a product from the ground up and release it to your end users just like your traditional engineering and product team. This can be found in entrepreneurial engineers who have a passion for science, math and discovery. These people need the intellectual curiosity, entrepreneurial instincts and data engineering skills necessary to deliver results in the form of phenomenal data products.

Take me as an example. I played a key role in developing our integrated data product engineering team — from building large-scale production data-processing pipelines to machine learning algorithms for click prediction — despite the fact that I lack the traditional academic credentials, having dropped out of college as an undergraduate. There are plenty of ex-engineers from Google, Twitter, Facebook and countless startups who would call themselves software engineers or developers, rather than data scientists, and they possess all the skills your startup needs.

Finally, it is not just about hiring the right people, but also about properly weaving them into your culture and organizational structure. Our data product engineers sit with and even pair program with other engineers, creating the tight integration that facilitates quick iterations to keep the engineering team nimble and delivering. If you separate your science team, you create a transactional relationship where “science” is thrown over the wall to engineering, resulting in your science team producing output that cannot be productionalized or productized.

The goal of a startup is to develop products that change the world, and often that starts with data. To do this you need data product engineers who tightly integrate with your engineering team, and have the skills to transform data into products.

Ryan Weald is a data scientist at ShareThrough.

Feature image courtesy of Shutterstock user Tatiana53.