Understanding “Differential Privacy”

The U.S. Census Bureau collects sensitive data from individuals and households, publishing demographic data that is essential for telling the story of Americans and their communities. Under Section 9 of the Census Act, U.S. Code Title 13, as a “trusted curator” the Bureau is required uphold respondent privacy in data that are released publicly. To meet this requirement, the Census Bureau typically releases aggregate-level data (i.e. census block, block-group, or tract) and implements various disclosure avoidance techniques (e.g. collapsing data, variable suppression). More recently however, modern computational methods, combined with more publicly available data, has increased the risk of exposing individual privacy. In response, the Bureau has explored approaches for modernizing disclosure avoidance.

What is Differential Privacy?

Differential privacy includes various techniques aimed at limiting available aggregate information to protect individual privacy. More specifically, differential privacy attempts to balance privacy loss and accuracy through mathematical formulas. Once the limit for acceptable privacy loss is established (this is part of the current debate), measures including adding synthetic data (or introducing “synthetic noise”), data-swapping, and data imputation can be used to ensure that the database is sufficiently safeguarded from reconstruction and individual identification.

Differential Privacy And the Census

The U.S. Census Bureau has expressed interest in implementing differential privacy in the 2020 Census. In practice, the Census Bureau would need to set a limit for the amount of disclosure avoidance that balances privacy with data utility and accuracy. Given the importance of decennial census and ACS data, it is critical to understand the impact that differential privacy will have on data availability, particularly for cross-tabulated data (e.g. poverty by race/ethnicity), microdata (e.g. Public Use Microdata Sample or PUMS), and for small-area geographies (e.g. census blocks). To date, the Bureau has not applied differential privacy to their ACS or decennial datasets nor have they released information about how differential privacy would be approached on the upcoming census.

Concerns ABOUT differential privacy

A white paper written by Census Bureau staff acknowledges that differential privacy “lacks a well-developed theory for measuring the relative impact of added noise on the utility of different data products, tuning equity trade-offs, and presenting the impact of such decisions.” By adjusting the perceived demographic composition of communities, differential privacy has the capacity to disproportionately impact racial/ethnic minorities and underrepresented individuals. Communities where individuals of color make up a small percentage of the population, for example, may require data swapping to a different tract or block group to meet the privacy limits set under differential privacy protocol.

Dr. Steven Ruggles, Director of the Institute for Social Research and Data Innovation at the University of Minnesota, notes in a recent tweet that, “the Census Bureau announced new confidentiality standards that mark a ‘sea change for the way that official statistics are produced and published.”

The Minnesota Population Center (MPC) outlines a number of concerns regarding differential privacy. The MPC recommends the following points be addressed before differential privacy policies are implemented:

  1. More testing is needed before final decisions are made on how differential privacy will be applied to census data.

  2. Differential privacy is not appropriate or feasible for ACS microdata (e.g. PUMS).

  3. For all data products, the Census Bureau should proceed cautiously in close consultation with the data user community.

For more information

This topic is ever-evolving and as such, this post will be updated to make the most current information available.

Jason JurjevichSCOTUS