MyHeritage automates record-matching as genealogy wars heat up

When it comes to social networks, few are more important – and harder to pin down – than the family tree. So it’s no surprise that the fierce competition between the two leading platforms, and MyHeritage, is getting ever more technologically advanced.

Derrick covered some of the techniques being used by back in June, and today we can reveal the latest weapon in MyHeritage’s arsenal: automated record matching.

Both platforms lean heavily on records as a way of augmenting the drier names and dates that make up family trees, but the Israel-based MyHeritage – which already has its own angle by explicitly treating the service like a social network – reckons it now has the edge.

According to CEO Gilad Japhet, MyHeritage has had its Record Matching tech ready for some time, but needed to set up a server farm, then clear a backlog of four billion historical records (including the world’s largest historical newspaper collection, acquired through the company’s FamilyLink buy last year), before launching it today.

“They come from original documents, birth records, marriage certificates, passenger lists going through Ellis Island, tombstones – in a few cases user contributed, as some people take snapshots of gravestones and upload them – public information, census records, newspaper articles and books. Record Matching covers both text-based and structured records, those that can be filled into a regular database,” he told me.

As an example, let’s say you don’t know the date of birth or death for your grandfather, but you do know his name. MyHeritage has a big database of wills, but again, you’re lacking dates. So the service would use its already-existing Smart Matching technology to compare the known information with that on other family trees, perhaps pinning down dates through other relatives’ connections.

Then, armed with that, it would find what it can in those historical records, using semantic analysis to deal with the free-text newspaper cuttings for example.

The smart thing, and one that Japhet hopes will pull in more subscribers and pay-as-you-go credit users, is that Record Matching works automatically and provides snippets of information for free. If you’re a user, you’ll just get an email telling you what’s been found. If you want to see the full record, you pay, but it doesn’t require that step to prove its worth.

So why did MyHeritage decide to shun the cloud for all this?

logo“We found it wasn’t very efficient to run this in the cloud because the CPU power you get is typically smaller, as a lot of these servers are virtual,” Japhet said. “We wanted serious number-crunching capabilities, and found it more efficient for us to purchase high-end servers, put together a large farm, run it all and accumulate the matches. It’s an ongoing real-time system.”

Japhet also claims other advantages over, an older and larger service (38 million family trees to MyHeritage’s 23 million). For one thing, he points out that MyHeritage is available in 38 languages and its rival in just half a dozen – that makes a difference when you consider the international aspect of genealogical research.

What’s more, MyHeritage intends to “launch a massive crowdsourcing based transcription system” for its users within the next year, he added. And so the battle for family history continues.