Differential Privacy: Assessing 2020 Concerns using 2010 Census Data

Do you use census data to understand who lives in your community? To estimate the need for funding? Or to plan programs and policies?

Much of the news surrounding the 2020 Census focuses on ensuring a complete count by working with communities with low response rates. But the data users’ community, including the Connecticut Data Collaborative (CTData), has been grappling with another critical question: What would less accurate census data in 2020 mean for your program or community?

This question spurred from an announcement from the U.S. Census Bureau indicating their plan to use a “differential privacy” framework starting with the 2020 Census. Differential privacy is a statistical technique that includes methods such as data swapping and noise injection to better protect the confidentiality of census respondents. This is a complicated process that CTData outlined in a previous blog post.

While protecting the confidentiality of respondents is critical, many members of the data users community are concerned about the potential impact of this approach on the scope and quality of future census data products. The quality of the data becomes especially worrisome in small geographies, of which Connecticut has many! In fact, once differential privacy is in place, we would no longer be able to present accurate data on race for Connecticut towns.

To help data users assess the impacts of differential privacy in their communities, CTData used the IPUMS joint data release to develop an interface to compare the original 2010 town-level data with what the 2010 town-level data would look like if differential privacy was applied.

To what extent did the original 2010 Census data change after differential privacy was applied? We’ll look at a few variables below as examples. You can see what these and other variables look like in your community by visiting CTData’s interactive online tool.

Population Estimates

Under differential privacy, variables for larger geographies (i.e., state, county) and of larger sub-populations (i.e., certain racial groups, age groups, etc.) do not undergo much change. Across all 169 Connecticut towns, the percent change in total population after differential privacy ranges from -0.9% to +5.5%. For example, the total population of Hartford in 2010 was 124,775 before differential privacy and 124,024 after differential privacy, indicating a percent change of -0.6%. Smaller towns like Union actually saw a 5.5% growth under differential privacy from 854 to 901 people.

Race

The concerns about differential privacy deepen when you disaggregate the data by subgroup or location. For example, we see drastic differences in percent changes by racial group. In fact, under differential privacy, some racial groups suddenly have no representation in certain towns (-100%) whereas others have an increase of 3,300%.

Race Lowest % Change Highest % Change
White alone -2% +4%
Black or African American alone -100% +733%
American Indian and Alaskan Native alone -100% +1,500%
Asian alone -100% +667%
Native Hawaiian and Other Pacific Islander -100% +3,300%
Some other race alone -100% +700%

Applying differential privacy to race data in Hartford revealed a 23.7% decrease in the number of people who identify as American Indian and Alaska Native and a 41.9% increase in people who identify as Native Hawaiian and Other Pacific Islander. Towns such as Roxbury could see a 100% decrease for certain groups, with no residents being identified as American Indian or Alaskan Native nor Black or African American. 

Ethnicity

The findings are equally unsettling for ethnicity. Once differential privacy was applied, the 2010 Census estimates saw a -43% to +400% change in people reporting that they are Hispanic or Latino and a -2% to +6% change in people reporting that they are not Hispanic or Latino. For example, the town of Scotland saw a 43% decrease in its Hispanic or Latino population under differential privacy (58 people to 33 people). Meanwhile, the town of Hartland had a 400% increase it its Hispanic or Latino population after applying differential privacy (12 people in 2010 and 60 after differential privacy). Such large changes raise concern about the accuracy of the findings and their utility for developing programs and allocating funding.

Ethnicity Lowest % Change Highest % Change
Hispanic or Latino -43% +400%
Not Hispanic or Latino -2% +6%

Occupancy Status

Another variable that raises alarm under differential privacy is housing occupancy status, particularly estimates of vacant housing units. After differential privacy was applied to the 2010 Census data, the number of vacant housing units saw a percent change that ranged between -100% to +120%, depending on the town. This means that certain towns such as Redding and Coventry, which had over 300 vacant housing units according to the 2010 Census, would report 0 vacant housing units with the new privacy protection rules. Conversely, cities like Bridgeport saw a 78% increase in their numbers, going from 5,757 vacant units on the 2010 Census to 10,258 vacant units after differential privacy.

Occupancy Status Lowest % Change Highest % Change
Occupied -10% +41%
Vacant -100% +120%

So, What Now?

The examples above provide insight into potential data quality concerns once differential privacy is applied to the upcoming 2020 Census. CTData’s interactive online tool illustrates how applying a differential privacy framework to the actual 2010 Census data shifts estimates for important variables our communities rely on to understand the populations we serve.

Once differential privacy is in place, we would no longer be able to present accurate data on race for Connecticut towns, among other variables

We are thankful that advocacy efforts from data users across the country postponed the implementation of differential privacy on the American Community Survey until 2025. We remain concerned that this method will be implemented on the 2020 decennial Census. 

As Connecticut’s Census State Data Center, CTData is the official local resource for census information. We will continue to keep data users abreast of updates from the U.S. Census Bureau as we hear more developments about census data and the decennial count.