Pinterest bought Kosei because recommendations are really hard

Pinterest announced Wednesday that it has acquired Kosei, a Palo Alto, California-based startup that focuses on machine learning for product recommendations. It’s a smart buy for Pinterest because the company’s path to profitability depends on its ability to connect users, products, and the companies or people selling them.

Here’s how Pinterest explains the acquisition in a blog post:

Over the past year, Kosei has been building a unique technology stack that drives commerce by making highly personalized and powerful product recommendations, as well as creating a system that contains more than 400 million relationships between products. As we build a discovery engine for all objects, Kosei is a perfect fit for our team.

. . .

As people use Pinterest to save and discover the things they want to do in the future, we have a unique and growing data set of more than 30 billion Pins that will only get more powerful over time. With the addition of the Kosei team, we can supercharge our existing graph to help brands reach people at the right moments, and improve content for Pinners.

For Pinterest, as the post goes on to note, the Kosei team will add to several other machine-learning-based teams at Pinterest, which are responsible for everything from spam detection to deep-learning-based object recognition (via its Visual Graph acquisition in early 2014). Kosei joins an existing “Discovery” team that’s already working on recommendations and user-behavior models.

Pinterest's guided search feature

Pinterest’s guided search feature

But the bigger picture here (and something several speakers will no doubt cover at our Structure Data conference in March) is that, despite years of effort by companies such as [company]Amazon[/company] and [company]Netflix[/company], recommendations — a driving factor behind the entire big data and data science movement — are far from a solved problem. Data science teams at those companies, as well as at places such as [company]Facebook[/company], [company]Google[/company], [company]LinkedIn[/company] and [company]Twitter[/company], are always testing out new variables and tweaking their models in an effort to put the right content — ads, users or otherwise — in front of the right people.

And they have some of the smartest people and most-advanced systems around. For laypersons and smaller companies, recommendations can be a much more daunting task, although there are now startups, open source projects and other efforts trying to address the situation.

As long as the web continues to be a hub for our shopping, education, socializing and media consumption, companies will strive to personalize it the names of user experience and revenue. Which means they’ll also keep pumping money into the graphs, models and algorithms that make personalization possible.

Machine learning startup GraphLab raises $18.5M, becomes Dato

GraphLab, a Seattle-based startup trying to make machine learning more accessible, has raised an $18.5 million series B round of venture capital and has changed its name to Dato. The company has now raised $25.3 million, with the latest round coming from existing investors NEA and Madrona Ventures and new investors Vulcan Capital and Opus Capital Ventures.

When the company first launched in 2013, it was attempting to commercialize an open source graph-computing project called GraphLab. However, Co-founder and CEO Carlos Guestrin told me, many of its customers were also interested in analyzing different types of data using different techniques — sometimes much more than they were interested in the graph part — and i became obvious a change was needed.

The release of the company’s Create service in July, which includes tools for things such as regression and deep learning models as well as graph processing, was the first step in the process. The decision to change the company name from GraphLab to Dato came about in the last six weeks, Guestrin said.

“Our name no longer matched who we were … and it wasn’t an aspirational name that could grow with us,” he added.

Carlos Guestrin. Source: Carnegie Mellon University

Carlos Guestrin. Source: Carnegie Mellon University

Early Dato customers include Adobe, Zillow, PayPal and Cisco, and many use it recommendation engines and other data mining projects. Dato Create handles everything from building the initial models to rolling them out into production applications, and is aimed at what Guestrin calls “savvy engineers” — folks who understand how to build applications and connect them to a database, perhaps, but who haven’t necessarily taken a machine learning course.

He said Dato hopes to expand its user base even further in time by making it so Create can choose and tune the right algorithms and models.

“The question for me is not keeping up with the Joneses … but trying to understand folks who don’t care about buzzwords,” he said. “What are the capabilities they need?”

When I asked Guestrin if graphs had seen their best days amid a flurry of activity a couple years ago, he said it was just a matter perception. The people most interested in graphs tend to be early adopters of new technologies, he suggested, and as more people got interested in machine learning, the voices talking about graphs got drowned out by people trying to solve different types of problems.

“How many people do you know who wake up in the morning and say, ‘Do you know what’s missing in my life? A graph database,'” he joked.

To learn more about the future of machine learning and some of today’s cutting-edge use cases and technologies, attend Gigaom’s Structure Data conference March 18-19 in New York.

How machine learning is taking over to make enterprise search smart

A new content-management platform called Highspot says it can make it easier for employees to find documents by employing the techniques used in web search engines. Its exact approach might be unique, but it’s not alone in trying to make enterprise search more intelligent.

An MLB team is apparently doing in-game graph analysis

A Major League Baseball team is reportedly the proud owner of a Cray Urika graph-processing appliance that helps the team make in-game decisions by analyzing lots and lots of data. It might be a first, but it’s where sports are headed.

How Lumiata wants to scale medicine with machine learning and APIs

A startup called Lumiata is taking webscale graph analysis like Google and Facebook have perfected and turning it toward personalized health care. As we generate more digital data about research and even personal health, it’s an idea whose time has come.

Let’s build a semantic web by creating a Wikipedia for relevancy

The relevancy-defining, edge-weighting algorithms of Google’s Knowledge Graph, Facebook’s Open Graph and Gravity’s Interest Ontology are closely guarded company secrets. Imagine if that data was available to everyone — it would be as disruptive as Amazon Web Services. The internet would be a better place.

Teradata Aster now does graph processing

Teradata has upped the capabilities of its Teradata Aster big data platform by adding in a native graph-processing engine called SQL-GR. Not a bad idea considering the increased attention around graph processing lately, as well as the need for an aging Teradata to keep up with (or ahead of) of the Joneses in the big data space. And Teradata’s SNAP Framework — which ingests a query and then decides the right processing engines and data stores to invoke — is pretty sweet in theory.

Apache Giraph at Analytics @ Web Scale

If you're interested in learning more about Facebook's version of Giraph, check out this video of a talk Avery Ching gave on the subject at the Analytics @ Web Scale event earlier this summer.

Posted by Facebook Engineering on Friday, August 23, 2013

This is a good presentation about Facebook’s graph-processing engine, Giraph, from a big data event held at the company’s Menlo Park campus in early June. The PRISM story kind of took over the news cycle that week, but the event also produced some news (for big data geeks, at least): Facebook’s Presto engine for interactive queries of its 250-petabyte Hadoop data warehouse.