CTData 2022 Mini-Conference Recap: What You Need to Know about Using Census 2020 Data
The 2020 Decennial Census faced unprecedented challenges: the COVID-19 pandemic, political interference, natural disasters, and the implementation of a new approach to privacy protection called differential privacy. We invited two experts to share what this means for using Census 2020 data. Amy O’Hara is a Research Professor in the Massive Data Institute and Executive Director of the Federal Statistical Research Data Center at Georgetown University. Elizabeth Garner is the State Demographer at the Colorado State Demography Office.
Key Takeaways:
2020 Census challenges caused major delays in releasing data products, with most products planned for release in May 2023 or later. See here for the 2020 Census data product release schedule.
The 2020 Census is the first census to use differential privacy for disclosure avoidance (for the 2010 Census, data swapping was used for disclosure avoidance; see here for a history of census privacy protections). The differential privacy algorithm injects noise into the publicly released 2020 Census data to protect people’s privacy. The Census Bureau has been fine-tuning the algorithm based on their own analyses and several iterations of user feedback to try to ensure that the data are still accurate enough to address most high-priority use cases while preserving people’s privacy. However, questions remain about which data will be fit for use, and how users can decide if the data are accurate enough for their purposes. In particular, the relative accuracy of statistics on smaller populations and geographies is more affected by differential privacy. To learn more about differential privacy, see the Census Bureau’s site on Understanding Differential Privacy and this list of differential privacy resources compiled by Georgetown’s Massive Data Institute. Over the next year, CTData will be providing guidance to CT data users on how to use the 2020 Census data including the impacts of differential privacy on data accuracy.
Although the Decennial Census is the nation’s only source of block-level population and housing data, 2020 Census data have low accuracy at the block level due to the noise injected by differential privacy. This is by design, as block-level data pose the highest privacy risk. The Census Bureau advises aggregating blocks for analysis. (Note that the Census Bureau also advises against using block-level data from the 2000 and 2010 censuses – see Disclosure Avoidance for the 2020 Census: An Introduction)
Due to how the differentially private 2020 Census data were processed, the Census Bureau cautions against dividing across population and housing tables for small geographic areas such as block groups (for example, dividing the household population by the number of occupied housing units to get the average number of people per household). Instead, they advise waiting for the release of the Detailed Demographic and Housing Characteristics file (currently slated for release in August 2023 or later) for statistics on people per household in small geographies. For more information, see Disclosure Avoidance for the 2020 Census: An Introduction
The Census Bureau made substantial changes in how they collected and processed race and ethnicity data starting with the 2020 Census. These methodological changes have significantly increased the proportion of the population classified as multiracial and complicated comparisons to race/ethnicity data from earlier years (see this Census Bureau blog and story for more information and look out for a blog from CTData on this topic in the near future).
All of this matters! Issues with the timing and accuracy of the 2020 Census data impact people’s political representation and recognition, population counts and forecasts, funding allocations and program eligibility, health equity data, the enforcement of anti-discrimination laws, and more.
View A Recording of The Conference Session Here:
View The Presenters’ Slides Here:
Additional Resources:
2020 Census County Assessment Tool. This tool from Georgetown University’s Massive Data Institute allows users to 1) select a county, 2) identify major obstacles to the census in 2020, and 3) compare how the 2020 data came out against similar expectations based on 2010 census data.
2020 Census Impossible Blocks Viewer. This tool, developed by Georgetown University’s Massive Data Institute, allows users to view mathematically impossible census blocks (such as blocks with a non-zero household population but no occupied housing units) from the 2020 Redistricting Data Summary File (P.L. 94-171), which had differential privacy applied.
Census Bureau 2020 Census Operational Quality Metrics for Counties and Tracts. These metrics released by the Census Bureau provide insight on the quality of the 2020 Census operations (for example, response rates) within counties and census tracts.
Reviewing and Revising Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity | OMB | The White House. OMB is currently reviewing and revising the standards that govern how the Census Bureau collects and presents data on race and ethnicity. To schedule a public listening session with the OMB Working Group, send an email expressing interest to Statistical_Directives@omb.eop.gov (see also OMB Launches New Public Listening Sessions on Federal Race and Ethnicity Standards Revision | OMB | The White House)
For More Information:
You can watch other sessions from the CTData 2022 Mini-Conference at the conference hub.
If you are interested in learning more about CTData, check out our mission and values and the services we provide. For training and tips on how to use data to inform your personal and professional life, register for one of our CTData Academy workshops or browse our blog. You can keep up with us by subscribing to the CTData newsletter and following us on Twitter, Facebook, and LinkedIn.