What’s Changing with Census Data Availability? Differential Privacy: What is It and Why it Matters for Census 2020
**This post was updated on March 18, 2020 with the addition of newly released Census Bureau resources
CTData has traveled state to state on the conference circuit over the past few months. We recently attended the American Community Survey (ACS) Users Data Conference in Washington, DC; the National Neighborhood Indicators Partnership Meeting in Milwaukee, WI; the Census State Data Center Conference in Charlotte, NC; and the Association for Public Data Users Conference in Washington DC. One of the most pressing topics of discussion was the 2020 Census and its new differential privacy policies. To help you gain some clarity about these contested policies, we’ll provide background about the census, discuss existing privacy protections, and describe upcoming changes to Census 2020 under differential privacy.
What is the decennial census?
It all began in 1790, when Congress ordered the decennial census to count every person living in the United States every 10 years. Two hundred and thirty years later, and the count continues to evolve. The 2020 Census will ask seven basic demographic questions, including the number of people living in the household, whether the home is owned or rented, each person’s age, sex, race, and ethnicity, and their relation to the head of household. More detailed information about individuals and households actually comes from another Census Bureau product, the American Community Survey (ACS).
How are census data used?
Planners, policymakers, nonprofits, researchers, and many others use ACS data to describe communities, influence policy, and engage in strategic planning, just to name a few purposes. For example, nonprofit organizations use ACS data to justify the need for continued services in the area by providing insight into the communities in which they work. CTData also makes finding ACS data easier than ever through our data page and visualization tools.
How does the Census Bureau protect the data?
The Census Bureau is committed to protecting data and keeping responses confidential. Title 13 of the U.S. Code prohibits the Census Bureau from releasing any identifiable information of people who fill out the decennial census and the ACS. Currently, the Census Bureau applies advanced statistical methods called disclosure avoidance techniques to protect the data. These methods include data swapping and noise injection to safeguard against linking the data back to a specific individual.
What does this mean for Census 2020?
Starting with the 2020 Census and then expanding to the ACS in 2025, the Census Bureau will use a new, more stringent, disclosure avoidance technique to protect individual-level data called differential privacy. The impetus for differential privacy comes from a reinterpretation of existing Census law. Previously, the interpretation of the law determined that the census cannot reveal the identity of respondents (re-identification). The revised interpretation asserts that the census cannot “reveal characteristics of an individual even if the identity of that individual is effectively concealed” (reconstruction; Institute for Social Research and Data Innovation).
The Census Bureau argues that new and aggressive privacy protection techniques are necessary because database reconstruction is an emerging cyber issue that demands even more precautions as technology improves. The census is not unique in its use of differential privacy techniques—companies like Apple and Google both adopted differential privacy in an attempt to better protect customer’s personal data. However, reports indicate that the Census Bureau plans to enforce more rigorous approaches than both of these companies, which has led to debates around the tradeoffs between data privacy, accuracy, and utility.
What is data re-identification versus data reconstruction?
To understand the new 2020 Census differential privacy policies, we first need to understand the difference between re-identification and reconstruction.
Re-identification is when you can match identifying information—such as a person’s name—from a secondary source to someone’s individual responses on the census (these individual responses are also known as microdata). For example, imagine reading an article chronicling a Connecticut resident’s battle with a rare disease that provides descriptive information about the person—name, household size, town of residence, age, race, etc. Re-identification is the ability to find that person in the detailed, publicly available, census data tables using the descriptive characteristics published in the article.
Reconstruction, on the other hand, “aims to recreate a non-public dataset from publicly available data," as defined by Rolando Rodríguez and Amy Laguer of the Center for Enterprise Dissemination at the Census Bureau. For example, if you know that a particular census tract has three black non-Hispanic women aged 25 to 29, you can then create three microdata records with these individual-level characteristics. By repeating this process for every cell in the table, the full content of the table may be expressed in the form of microdata. In other words, you now have a file with each row of data representing an individual who meets that unique grouping of characteristics. Again, reconstruction is different from re-identification: reconstruction means that you can match characteristics of a person (age, sex, race, etc.) but not necessarily determine the identity of that person.
Why are data users concerned?
According to the Census Bureau, a recent data reconstruction experiment found that:
Census block and voting age (18+) were correctly reconstructed for all individuals in all 6,207,027 inhabited blocks
Block, sex, age, race , and ethnicity were reconstructed exactly for 46% of the population and within approximately one year for 71% of the population
Block, sex, and age were then linked to commercial data, which provided putative re-identification of 45% of the population
Name, block, sex, age, race, ethnicity were then compared to the confidential data, which yielded confirmed re-identifications for 38% of the putative re-identifications
For the confirmed re-identifications, race and ethnicity were learned correctly
As noted previously, differential privacy arose from the reinterpretation of existing census law; there is no legal mandate for this change. The cause for concern about differential privacy is the inverse relationship between privacy, accuracy, and utility; as privacy protections increase, the accuracy and utility of the data decrease. Without accurate microlevel data, there is apprehension that the data will either become unusable or inaccessible.
Currently, it is unclear to what extent differential privacy will change census data. We do, however, know that under differential privacy, the Census Bureau plans to limit the types of tables and level of detail that they make publicly available. One product that may be eliminated is IPUMS, which provides microdata from the census and ACS for geographies with a population of 100,000 or more. This presents a threat to organizations, researchers, planners, and policymakers who rely on accurate data to serve their communities. For example, the CT Association of Human Services uses publicly available microlevel data to understand the differences by race of educational attainment of mothers who have children less than five years old, while another organization working with CTData uses it to understand homeownership of older adults across various income levels. These data inquiries are critical to understanding community needs, seeking additional grant funding, and tracking changes over time.
What if I have concerns about differential privacy?
Data users worry that the Census Bureau’s proposed methods to discourage reconstruction would render the data unusable. Current ACS data collection and statistical methods employed to ensure privacy (injecting noise and swapping similar households that reside in different geographies) already make both re-identification and reconstruction extremely difficult.
ACS users wrote an open letter urging the Census Bureau to be more transparent about the policies it is considering. The letter encourages the Bureau to engage more with ACS users before finalizing their decision to better understand the potential impact of the proposed solutions. Advocacy was successful—the Bureau recently announced that it will delay implementing differential privacy on the ACS until 2025.The Census Bureau has developed resources to help users understand differential privacy, including:
Differential privacy video sponsored by MinutePhysics
Census Bureau response to State Data Centers re: differential privacy (NEW!)
Modernizing Disclosure Avoidance: What We've Learned, Where We Are Now (NEW!)
We will keep data users abreast on updates from the Census Bureau as your Census State Data Center, CTData, is the official local resource for census information.