Society benefits from the exchange of large-scale data in a variety of ways: through medical research, economic forecasting and urban planning, to name a few. In these and other cases, there is tension between the utility of the data and the privacy of the subjects. This tension exists because more valuable insights can be derived if the data is fully transparent and no consideration is given to privacy.
Currently, the typical mechanism for addressing the privacy of data subjects is via anonymization. Unfortunately, anonymization is broken.
Relying on current forms of anonymization is not working mainly because de-identified data sets can be reverse engineered to reveal the associated identities and thus compromise privacy. Yet federal laws and industry regulations continue to rely on anonymization techniques to safeguard user anonymity in released data. Privacy statutes and industry self-regulation are inextricably tied to technology and technology tends to evolve at a faster pace than government and private regulation.
Perhaps it’s time to consider alternatives to safeguard the data gathered by companies that collect, retain or otherwise manage large repositories of user data. One approach might be to collect data on the data collectors, use that data to create privacy ratings and make those ratings public. These ratings could also be used to establish premiums for insurance that would reimburse users whose privacy is compromised.
There is value in continuing to analyze user-derived data. But better protections are needed to prevent the public exposure of clues that could reveal supposedly hidden individual identities. While government oversight would probably be necessary, a system that includes assessment, transparency and insurance could be implemented without excessive costs through industry self-regulation.
Independent organizations could perform assessments to rate data-holding entities on both institutional practices like clear and reasonable policy disclosure and technical practices like the use of encryption, advanced de-identification, or differential privacy. Firms wishing to retain large amounts of user data would publish their privacy practices and receive a public rating under the assessment framework. This could be mandatory or voluntary.
The data created by the assessors could be made public through an online clearinghouse that acts as an independent repository and certifier of privacy assessments for data-holding entities. This organization could be simply a website publisher or it could take on other responsibilities such as overseeing assessment standards and mediating disagreements between data-holders and assessors. Users could consult the clearinghouse for a firm’s privacy assessment to guide their decisions on using the firm’s services.
Armed with this data, insurance companies could create privacy insurance and bonding for larger incorporated data-holding entities. Data-holding firms could then pay an insurance fee to cover privacy breaches. The insurance premium could be based on data-holders’ compliance rating, the amount of data they have at-risk and the type of data they hold. The goal of the bond setup would be to incentivize data-holders to deploy state-of-the-art privacy preservation standards and to protect users when breaches occur.
These three proposals could come with light implementation. Government input could be limited to oversight, which could be carried out by existing agencies like the Federal Communications Commission or the Federal Trade Commission. The commercial insurance sector could play a potentially lucrative role in creating the infrastructure for underwriting privacy insurance plans.
Of course there would be costs, but the cost of failing to implement effective digital privacy policies is currently being borne by users whose privacy is compromised. Relying on anonymization is not working. It’s time to begin considering alternatives.