Does your private data really need to be that private?

When it comes to medical or genomics data, the public good outweighs the benefits of keeping information private, said two academics speaking at the Big Data Privacy Workshop at MIT on Monday.

“I think most people fear death or the death of a loved one more than a loss of privacy,” said John Guttag, professor at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). In his view, patients or would-be patients would be well served to share their medical data — about hospital stays, treatments, procedures, etc. — in service of preventing things like the clostridium difficile (C. diff) infection.

Five percent of all U.S. patients suffer an infection unrelated to their admission and of those infections, C.diff is one of the most common, affecting 200,000 people per year, he said.

U.S. Secretary of Commerce Penny Pritzker speaking at MIT Big Data Privacy Summit.

U.S. Secretary of Commerce Penny Pritzker speaking at MIT Big Data Privacy Workshop.

To help figure out how patients get infected and to help avoid future problems, the anonymized information that most people talk about when trying to paint big data analytics as non-threatening — won’t work, Guttag said. Instead, scientists researching C. Diff need to know personal, identifiable information  — the patient’s zip code, hospital room number, names of roommates, who treated her and where, and dates of treatment.

That information can be used to help prevent future outbreaks. And if the right auditing mechanisms are in place, anyone who uses that data for a non-authorized purpose would be punished. Note: Others have already advocated for the donation of personal medical data for the public good.

White House Counselor John Podesta, who kicked off Monday’s event by phone from Washington, D.C. (his trip north was thwarted by weather –“big snow trumped big data,” he said) had a question for the panel: “We can’t wait to get privacy perfect to get going. What few things do we need to get right right now?”

Guttag said a uniform, standard process by which patients could give informed consent would be a good start. And he thinks something should be done about the Health Insurance and Portability and Accountability Act, which was meant to keep patient data private but to also assure secure sharing of that information between authorized parties.

“HIPAA is a problem and probably prevents useful things from happening — it would be great to pay attention to the tradeoffs. We underestimate our society — if people understood how valuable it would be to allow their data to be used for medical research, they would do it.”

Others said people who provide data should be protected from bad outcomes from non-condoned use of their information.

“My biggest concern is discrimination by algorithm, having someone make decisions about you based on your status profile — if you’re a risky driver, if you have a genetic predisposition to something,” said Sam Madden, an MIT CSAIL professor who specializes in mobile big data.

Safeguards must be put in place to either prevent or mitigate that possibility. It all boils down to transparency and an informed consumer. “We have to talk about people having visibility into what’s being collected and being able to say ‘I don’t want you to keep that data any more,'” Madden said.

Consumers beware

But for many consumers, the privacy horse is out of the barn, largely because of their own actions. Anyone who posts to Facebook (S FB) or Twitter(S TWTR) or any number of special interest websites is handing over their data to aggregators, said Michael Stonebraker, an adjunct professor at CSAIL and the database brains behind Vertica(s hpq), VoltDB, and Data Tamer.

“The question is tricky. We all use Waze to navigate traffic and it knows all about us. We’re volunteering data in return for personal benefit. Any governance of what Waze can do is a legal issue, not a technology issue,” Stonebraker said.

People also have to distinguish between privacy and “the illusion of privacy,” said Manolis Kellis, an associate professor at CSAIL.

“Every time you take your coat off you’re providing DNA data to someone,” he said. Data leakage is inevitable in the physical and virtual worlds, but “laws should protect us so we don’t have to hide our genomic data because we can be discriminated against,” Kellis noted.

Case in point: People can be tested for genetic predispositions to Alzheimer’s and other ailments, but many refuse to do so for fear that their insurers will cancel their coverage or jack up their premiums.

Trust? What trust?

The notion that individuals should hand over their medical data for research purposes is not new, and good arguments can be made for doing so.

But, in Monday’s session, U.S. Commerce Secretary Penny Pritzker and other speakers stressed the need for trust between consumers and businesses. And frankly, trust is a commodity in short supply these days given the Edward Snowden revelations of NSA data gathering and data breaches at Target and other retailers.

In response to a question, Podesta, who was brought back into President Obama’s inner circle in January in part to ride herd on data and privacy issues. said this work is separate from a review of U.S. intelligence surveillance practices.

Boiling all of this down, to me this means that even if you do trust medical researchers at MIT or Harvard or Stanford with your health data, you would be justified in worrying that the data could end up with someone else and used for non-medical purposes.

There’s a ton of work to be done before Guttag’s vision of shared medical data can come to fruition.

If you want to hear more on big data and big data privacy, check out our Structure Data show in a few weeks.

Panelists (from left): Michael Stonebraker; John Guttag; Manolis Kellis; Sam Madden; Anant Agarwal.

Panelists (from left): Michael Stonebraker; John Guttag; Manolis Kellis; Sam Madden; Anant Agarwal.