Should we trust Google when it comes to piracy and search?

As we reported earlier, Google (s goog) recently announced that it will start filtering its search results based in part on the number of copyright-takedown requests that have been filed against a site: according to a blog post from the search giant, it will tweak its algorithms to rank a website lower if it has a large number of “valid copyright-removal notices.” And how does Google know whether a copyright-removal notice is valid or not? The short answer is that it doesn’t — which is part of the reason why YouTube in particular sees so many bogus takedowns. So how do we know that this filtering of search results won’t adversely affect some websites that are perfectly legitimate? Google’s response so far seems to be “trust us.” But should we?

The search company says it decided to make the change because it will help users “find legitimate, quality sources of content more easily,” but some critics of the move have a different theory: they figure Google has essentially caved in to pressure from media and content companies — the same kind of pressure that led the U.S. government to push for legislation such as SOPA and PIPA, which would have allowed copyright holders to remove offending websites from the internet completely based on just an allegation of infringement. While Google’s changes won’t do this, being pushed down in the search results of the web’s dominant search player can have a serious impact.

Google’s criteria are completely unknown

As the Electronic Frontier Foundation points out in a blog post criticizing the move, Google’s search algorithms are opaque by design, and so there is no way of knowing what kind of criteria they will be using to decide which sites to penalize and which to leave untouched. What does a “high number of copyright-removal notices” mean? We don’t know. And while Google provides a “counter-notice” process for those whose content has been removed from search altogether, it’s not clear whether there will be any method of appeal if you think your website has been downgraded in search results because of bogus copyright claims. Says the EFF:

“Without details on how Google’s process works, we have no reason to believe they won’t make similar, over-inclusive mistakes, dropping lawful, relevant speech lower in its search results.”

Danny Sullivan of Search Engine Land noted in a post that if the simple number of copyright notices against a site are the defining factor in whether Google drops them lower in results, then YouTube will be in grave danger, since it gets a vast number of them (although the site doesn’t appear in Google’s public list of sites where it has been asked to take down content). And there have been repeated examples of bogus claims that have led to the removal of lawful content from YouTube — including one recent incident in which several different media companies launched claims of ownership over a NASA video involving the Mars landing.

That kind of behavior isn’t likely to fill anyone with confidence in Google’s ability to differentiate between a valid copyright claim and an invalid one. And the company’s response to Sullivan’s post muddies the waters even further: Google said that YouTube — and other user-generated content sites such as Facebook, Tumblr and Twitter — won’t be penalized (or at least not very much) by the new algorithm changes, because of “nuances” in the new algorithm. What kind of nuances? The company isn’t saying. According to Sullivan:

“Google told me today that the new penalty will look beyond just the number of notices. It will also take into account other factors, specifics that Google won’t reveal, but with the end result that YouTube — as well as other popular sites beyond YouTube — aren’t expected to be hit.”

Is Google trying to curry favor with content companies?

Is it just large user-generated content sites that will get some kind of free pass? We don’t know. Is there some kind of white list of protected sites? Unknown. According to Sullivan, the company simply told him that the algorithm “automatically assesses various factors or signals to decide if a site with a high number of copyright infringement notices against it should also face a penalty.” What these various factors and signals are seems to be a secret — just as everything else about the company’s search algorithms is kept secret.

While this is presumably done to prevent people from gaming the system (or competitors from copying features), it makes it a lot harder to determine whether Google is unfairly penalizing websites for bogus copyright notices. And as the EFF points out, “false positives” are a huge problem — not just for Google but for the internet as a whole, with some websites and domains being seized by the government based merely on allegations of copyright infringement. While Google’s search penalty may not be as bad as that, it still feels like the search giant is taking action against websites that should be innocent until proven guilty.

Why would the company decide to do this? For one thing, it is being investigated by the Federal Trade Commission for antitrust activity, and it may see moves like the algorithm change as a way of showing that it is a beneficial force for society. Google is also trying to do more content-related deals with traditional media and entertainment players through YouTube, and that may have increased the pressure to come up with a response to piracy that provides at least a watered-down version of the penalties that those companies were pushing for with SOPA and PIPA.

The bottom line is that Google is essentially asking users to trust it to decide what to do with websites that have been accused of copyright infringement. But we have already seen that Google is prepared to engineer its search results for its own benefit rather than that of its users, with features such as “Search Plus Your World,” which was designed to promote Google’s social network. That kind of thing makes it harder to rely on blind faith in Google’s value judgments, especially when it comes to crucial questions around copyright and freedom of speech.

Post and thumbnail images courtesy of Flickr user Stefan