The Population Estimates "Blended Base:" What it is and Why it Matters
The Census Bureau’s Population Estimates Program (PEP) has historically used the decennial census count at the beginning of each decade as the “base” population count by age, sex, race and Hispanic ethnicity. This base population count forms the basis of the population estimates for the remainder of the decade. However, challenges with the 2020 Decennial Census, including delays with census operations due to the COVID-19 pandemic and complications related to the application of a new privacy protection system, resulted in the census data by age, sex, race and Hispanic ethnicity not being ready in time to be used for the vintage 2021 population estimates.
In response, the PEP team developed an innovative solution that combined information from several different data sources to estimate the population as of April 1st, 2020. The Census Bureau refers to this solution as the “blended base.”
In this blog post, we will:
Provide an introduction to the blended base and why it matters
Compare Connecticut’s blended base population estimates by age and sex to the 2020 Decennial Census counts for the state
Discuss the future of the population estimates base
Dive into more details on the blended base methodology for those who are interested
What is the Blended Base?
The blended base is the estimate of the national, state, and county populations by age, sex, race, and Hispanic origin as of April 1st, 2020, that the Census Bureau has used in their population estimates since the 2020 Decennial Census. The blended base uses population totals from the 2020 Decennial Census, but the distribution of the population by age, sex, race, and Hispanic origin comes from other data sources:
National age and sex distributions come from the Census Bureau’s Demographic Analysis (DA) Estimates.
Sub-national age and sex distributions, and all race and Hispanic origin distributions, come from the Census Bureau’s vintage 2020 population estimates (which do not incorporate any information from the 2020 Decennial Census).
See the section titled “More Details on the Blended Base Methodology” at the bottom of this blog post for more information about how the blended base is estimated.
Why Does the Blended Base Matter?
Although the "blended base” solution was born of necessity, it also brought with it the potential to address some systematic biases that have historically been present in decennial census counts. For example, decennial censuses since 1990 have under-counted young children, the Black and Hispanic/Latino population, the American Indian or Alaska Native population living on reservations, and males between the ages of 18 and 49. Additionally, the decennial census is prone to an issue called “age heaping,” wherein there are unrealistically high numbers of people reported as having an age that is a multiple of 5 (e.g., 20 or 25). Age heaping was particularly severe in the 2020 Decennial Census, due in part to higher rates of proxy reporting (i.e., when a neighbor, landlord, or other person reports for a household).
Research conducted by the Census Bureau suggests that, at the national level, the blended base appears to provide more reasonable age distributions compared to the 2020 Decennial Census. Specifically, the severe age heaping that occurred in the 2020 Decennial Census is not present in the vintage 2021 population estimates blended base, and the blended base corrects for the well-known undercount of young children in the 2020 Decennial Census.
Unfortunately, we cannot yet compare race data between the 2020 Decennial Census and the post-2020 population estimates because these data sources use different race categories. Specifically, whereas the decennial census includes a “some other race” category, the population estimates do not. Typically, the Census Bureau creates a “modified race file” that re-allocates individuals classified as “some other race” on the decennial census to one of five specific race categories for use in the population estimates. However, the technical challenges of applying differential privacy to the 2020 Decennial Census data delayed the release of detailed race data from the decennial census until June of 2023, so that these data could not be used in the vintage 2021 or 2022 population estimates. Even after the detailed race data from the 2020 Decennial Census were publicly released, a modified race file is not yet ready for release and will not be used for the upcoming vintage 2023 population estimates by demographic categories. It is currently unclear when a modified race file for the 2020 Decennial Census will be released.
Although it is not yet possible to directly compare race data between the 2020 Decennial Census and the post-2020 population estimates, we can compare the age and sex distributions between these two data sources.
In the next section of this blog post, we compare the total population estimates and age and sex distributions for Connecticut in the 2020 Decennial Census versus the “blended base” estimates for April 1st, 2020, that were released with the vintage 2022 population estimates.
We conclude that the blended base estimates for Connecticut’s total population by age and sex are likely more accurate than the 2020 Decennial Census counts. However, note that this may not be true regarding the blended base estimates for race and Hispanic origin, or for demographic characteristics within each of Connecticut’s nine planning regions.
Comparing the Population Estimates “Blended Base” and 2020 Decennial Census Counts in Connecticut
Connecticut’s Total Population
As would be expected, Connecticut’s total population in the vintage 2022 blended base is nearly identical to the 2020 Decennial Census count for the state. The blended base includes just two fewer people across the entire state. This very minor difference is presumably attributable to the application of differential privacy to the census data file used for the population estimates.
Connecticut’s Population by Sex
In contrast, the vintage 2022 blended base estimates of Connecticut’s population by sex are 1% higher for the male population and 1% lower for the female population compared to the census counts (that is, 18,671 more males and 18,673 fewer females across the state). This is in line with national data from the 2020 Census Post-Enumeration Survey suggesting that the 2020 Decennial Census significantly undercounted the U.S. adult male population by 1.3% (± 0.3) and overcounted the adult female population by 1.1% (± 0.3).
Thus, the population estimates blended base likely provides a more accurate estimate of Connecticut’s total population distribution by sex compared to the 2020 Decennial Census count.
Connecticut’s Population by Age
We examined differences in estimates of the state’s total population by 5-year age bands. Compared to the 2020 Decennial Census count, the blended base provides:
a higher estimate of the number of young children under 5 years old and youth between the ages of 10 and 19 years
The blended base includes 6,116 more children under 5 years old than were counted in the 2020 Decennial Census (3.5% higher), and a total of 13,265 more children and youth under the age of 20 (1.6% higher than the census count). This is roughly consistent with national estimates from the 2020 Post-Enumeration Survey suggesting that the 2020 Decennial Census undercounted all children in the U.S. by about 0.8% (± 0.4), and in particular undercounted children under 5 years old by about 2.8% (± 0.6).
a lower estimate of the number of residents ages 60 and older, particularly for those over the age of 80 years
The blended base includes 4,684 fewer adults over the age of 85 than were counted in the 2020 Decennial Census (5.4% lower), and a total of 22,483 fewer adults ages 60 and over (2.5% lower than the census count). This is roughly consistent with national estimates from the 2020 Post-Enumeration Survey suggesting that the 2020 Decennial Census overcounted the U.S. adult population ages 50 and older by about 1.7% (± 0.3), although results were not broken out for age groups within this range.
slightly higher estimates of the number of residents in each 5-year age bracket from 20 to 59 years old, with the exception of residents ages 30 to 34 years old
Although the blended base includes 3% fewer adults ages 30-35 (a difference of 6,888 residents in this 5-year age range), across the entire 20- to 59-year-old age range the blended base includes 9,216 more residents (0.5% higher than the census count). Nationally, estimates from the 2020 Post-Enumeration Survey suggest that the 2020 Decennial Census undercounted residents ages 18-29 by about 1.6% (± 0.6) and undercounted residents ages 30-49 by about 1.5% (± 0.4).
Overall, the alignment of differences between the blended based and 2020 Decennial Census counts with estimates of 2020 census net coverage error from the 2020 Post-Enumeration Survey suggest that the statewide age distribution of the population estimates blended based is likely more accurate than that of the 2020 Decennial Census.
Connecticut’s Population by Age and Sex
We next examine the differences in Connecticut’s population by age as estimated by the blended base versus the 2020 Decennial Census separately for males and females. We find that:
For children and youth between the ages of 0 to 19, the differences between the blended base estimates and the 2020 Decennial Census counts are of a similar magnitude for males and females.
Across this age range, the blended base estimates are 1.6% higher than the 2020 Decennial Census count for males and 1.5% higher for females.
For adults between the ages of 20 and 84, the differences between the blended base estimates and the 2020 Decennial Census counts are more negative (or less positive) for females than for males.
Across this entire age range, the blended base estimates are 1% higher than the 2020 Decennial Census counts for males and 1.6% lower than the 2020 Decennial Census counts for females.
Thus, it appears that the overall difference in the sex distribution between the blended base and the 2020 Decennial Census is concentrated among adults of all ages, but not among children and youth.
The findings for sex distribution differences for adults are also in line with the results of the 2020 Post-Enumeration Survey, suggesting again that the age by sex distribution of Connecticut’s total population in the blended base estimates is likely more accurate than that of the 2020 Decennial Census counts.
The Future of the Population Estimates Base
The Census Bureau recognizes the potential advantages of a blended base approach for improving the quality of the population estimates. However, much research still needs to be done to identify the best approach. To this end, the Census Bureau has established the Base Evaluation and Research Team (BERT) to inform decisions about which 2020 Decennial Census data are incorporated into the postcensal population estimates and to explore ways to improve the quality of the population estimates base.
Guided by BERT’s recommendations, the vintage 2023 population estimates blended base (slated to be released in June 2024) will incorporate Hispanic origin data from the 2020 Decennial Census, but will not use any age, sex, or race data from the 2020 Decennial Census. BERT’s ongoing research will inform the evolution of the population estimates blended base for future vintages.
More Details on the Blended Base Methodology
The Population Estimates Program’s blended base methodology estimates the population by age, sex, race, and Hispanic ethnicity as of April 1st, 2020, at the national, state, and county levels by combining information from three data sources:
2020 Decennial Census counts, infused with some differentially private noise, are used as total population controls at all geographic levels.
The Census Bureau’s Demographic Analysis (DA) Estimates are used to estimate the distribution of the national population by sex and single year of age. These estimates of the national population by age and sex are developed from vital records (i.e., records of births and deaths), estimates of international migration, and Medicare records. The DA estimates were created for the purpose of estimating net coverage error in the decennial census.
The Census Bureau’s vintage 2020 population estimates are used to estimate the distributions of race and Hispanic origin at the national level and of all demographic characteristics (age, sex, race, and Hispanic origin) at the state and county levels. State and county estimates for age and sex are then adjusted to sum to the national age and sex distribution based on the DA.
Notably, the demographic distributions in the vintage 2020 population estimates are based on the 2010 Decennial Census counts, with additions and subtractions for estimated population change over the decade. They do not incorporate any information from the 2020 Decennial Census.
The graphic below, created by the Census Bureau, illustrates how the three different data sources are integrated to form the demographic distributions in the vintage 2021 blended base. A very similar process was used to create the vintage 2022 blended base, although for vintage 2022 the group quarters population was estimated separately from the household population.
For More Information
You can read the Census Bureau’s documentation on their population estimates methodology here.
To learn more about the Census and resources provided by CTData, head to our Census Data portal, where you can learn about the different sources of population data on our Population Statistics Hub and access Census Tools and Resources for your census data project. Explore other data sets and analysis at data by topic and data projects. You can stay up-to-date on the latest data and tools by subscribing to our newsletter and following CTData on Facebook, Twitter, and LinkedIn.