Understanding “Differential Privacy”
The U.S. Census Bureau collects sensitive data from individuals and households, publishing demographic data that is essential for telling the story of Americans and their communities. Under Section 9 of the Census Act, U.S. Code Title 13, as a “trusted curator” the Bureau is required uphold respondent privacy in data that are released publicly. To meet this requirement, the Census Bureau typically releases aggregate-level data (i.e. census block, block-group, or tract) and implements various disclosure avoidance techniques (e.g. collapsing data, variable suppression). More recently however, modern computational methods, combined with more publicly available data, has increased the risk of exposing individual privacy. In response, the Bureau has explored approaches for modernizing disclosure avoidance.
What is Differential Privacy?
Differential privacy includes various techniques aimed at limiting available aggregate information to protect individual privacy. More specifically, differential privacy attempts to balance privacy loss and accuracy through mathematical formulas. Once the limit for acceptable privacy loss is established (this is part of the current debate), measures including adding synthetic data (or introducing “synthetic noise”), data-swapping, and data imputation can be used to ensure that the database is sufficiently safeguarded from reconstruction and individual identification.
Differential Privacy And the Census
The U.S. Census Bureau has expressed interest in implementing differential privacy in the 2020 Census. In practice, the Census Bureau would need to set a limit for the amount of disclosure avoidance that balances privacy with data utility and accuracy. Given the importance of decennial census and ACS data, it is critical to understand the impact that differential privacy will have on data availability, particularly for cross-tabulated data (e.g. poverty by race/ethnicity), microdata (e.g. Public Use Microdata Sample or PUMS), and for small-area geographies (e.g. census blocks). To date, the Bureau has not applied differential privacy to their ACS or decennial datasets nor have they released information about how differential privacy would be approached on the upcoming census.
Concerns ABOUT differential privacy
A white paper written by Census Bureau staff acknowledges that differential privacy “lacks a well-developed theory for measuring the relative impact of added noise on the utility of different data products, tuning equity trade-offs, and presenting the impact of such decisions.” By adjusting the perceived demographic composition of communities, differential privacy has the capacity to disproportionately impact racial/ethnic minorities and underrepresented individuals. Communities where individuals of color make up a small percentage of the population, for example, may require data swapping to a different tract or block group to meet the privacy limits set under differential privacy protocol.
Dr. Steven Ruggles, Director of the Institute for Social Research and Data Innovation at the University of Minnesota, notes in a recent tweet that, “the Census Bureau announced new confidentiality standards that mark a ‘sea change for the way that official statistics are produced and published.”
The Minnesota Population Center (MPC) outlines a number of concerns regarding differential privacy. The MPC recommends the following points be addressed before differential privacy policies are implemented:
More testing is needed before final decisions are made on how differential privacy will be applied to census data.
Differential privacy is not appropriate or feasible for ACS microdata (e.g. PUMS).
For all data products, the Census Bureau should proceed cautiously in close consultation with the data user community.
Finally, in a recent presentation at the American Community Survey (ACS) Data Users Group (DUG) meeting in May 2019, Dr. Connie Citro recommended the Bureau consider the following points as they move forward on differential privacy:
An observation and recommendation by Dr. Citro: “taking the relationship between the Census Bureau and users to the next level of systematic, two-way interaction. That relationship, in my experience going back over 50 years, is not yet there” (Slide 3).
To build credibility among data users (Slide 7), Dr. Citro calls on the Bureau to: “institutionalize systematic, two-way, transparent interaction—structured input, dialog, preliminary decision, [repeat], and document the final decision (Slide 8).
Dr. Citro offers a number of “Ways and Means to Step Up” (Slides 11-13).
For more information
This topic is ever-evolving and as such, this post will be updated to make the most current information available.
Good, less-technical overviews:
“To Reduce Privacy Risks, the Census Plans to Report Less Accurate Data.” Mark Hensen, New York Times (December 2018).
“Potential privacy lapse found in Americans' 2010 census data.” NBC News, (February 2019).
More in-depth, technical resources:
“Differential Privacy, An Easy Case.” Mark Hansen (January 2019).
“Innovating Data Privacy for the American Community Survey.” -Rolando Rodriguez and Amy Lauger (2019).
“Challenges and New Approaches for Protecting Privacy in Federal Statistical Programs.” National Academies, Committee on National Statistics (2019).
Changes to Census Bureau Data Products.” University of Minnesota, IPUMS webpage (2019).