Navigating Immigration and Language Datasets: A Step-by-Step Guide

Though our Data Helpline, CTData often receives questions about accessing data on immigration and languages spoken in Connecticut from organizations who want to better understand their communities and tailor their approaches to outreach and service provision. While there are a number of Census Bureau surveys that include questions about immigration status, including the American Community Survey and the Current Population Survey, they each have different strengths and limitations; finding the right dataset to use can be difficult. In this blog post, we introduce different datasets that include local immigration and language data and discuss how to access them.

 

Data Source 1: American Community Survey Tables

The American Community Survey (ACS) is one of the most robust ongoing surveys nationwide that includes demographic, social, economic, and housing questions. The Census Bureau publishes tables from the ACS every year in two different forms: the one-year ACS and the five-year ACS. The former is available for geographies with populations of 65,000 or more, while the later is available for both large and small geographies, including census tracts and towns.

Screenshot of data.census.gov geographic filters

Data from the ACS can be downloaded from data.census.gov, and the Census Bureau creates helpful resources and guidance for using this website on an ongoing basis. ACS data can also be accessed via the Census Bureau’s API. To find data for Connecticut towns, head to data.census.gov and click on “Advanced Search”. From there, select towns through the Geography filter on the left by clicking on “County Subdivision”, Connecticut, then select either “All County Subdivisions within Connecticut” to access data for all towns, or select a planning region to get data for all or some towns within that planning region. If you open a data table and do not see data for all selected towns, click on the table’s title and select 5-year estimates.

The ACS also publishes tables on languages spoken at home, including detailed language tables with data on speakers of 13 unique language groups. Data tables that include social and economic characteristics by language are more limited; they generally include Spanish, other Indo-European languages, Asian and Pacific Island languages, and other languages. Data on age, nativity, poverty status, and educational attainment is available for people speaking English at home, people speaking a language other than English at home, and for those who speak Spanish at home. To find language data tables, type “Language Spoken at Home” in the search bar at the top of the page, then select “View All Tables” to scroll through available data tables on languages spoken. 

Similarly, to find data tables on the immigrant population, type “Native and Foreign-Born” in the search bar. Data on age, sex, and place of birth is available by nativity and citizenship status. Data on economic and social characteristics is available by period of entry into the United States or by region of birth, which includes Europe, Asia, and Latin America. 

It is important to note that the ACS is a survey of a sample of residents, not all residents. ACS estimates have margins of error which provide information about the precision of the estimate. Roughly speaking, there is a 90% certainty that the “real” value is around the estimated value plus-or-minus the margin of error. These margins of error can sometimes be higher than the estimate itself. More information about using data from the ACS and considering margins of error is available in the Census Bureau’s Best Practices webinar

 

Data Source 2: ACS Public Use Microdata Sample (PUMS) 

To fill some of the data gaps in the published ACS tables discussed above, the ACS Public Use Microdata Sample (PUMS) allows users to create custom data tables and includes variables that are not always available in the published ACS tables, including detailed languages spoken. PUMS data is a sub-sample of ACS data and is available for states and Public Use Microdata Areas (PUMAs), geographic areas with populations of at least 100,000. At the town level, only Hartford, Bridgeport, New Haven, and Waterbury are large enough to be the only town in their PUMA.

PUMS data can be accessed through the Census Bureau’s Microdata Access Tool (MDAT), the Census Bureau’s API, and  IPUMS USA, the latter of which requires users to create a free account.  

To use MDAT to create tables with PUMS data, first select a dataset, either the ACS 5-Year Estimates Public Use Microdata Sample or the ACS 1-Year Estimates Public Use Microdata Sample, and a vintage year. On the next page, users can see the available variables and narrow the list by filtering by topic or by searching within a variable label. Searching “language” returns four variables, including, “Detailed household language,” which includes 130 different languages. The Census Bureau publishes resources to help navigate MDAT and can be found here

While MDAT does not include margins of error in the resulting tables, that does not mean the data is exact or error-free. It is important to calculate the margins of error for data tables used. Margins of error can be calculated in two ways: a generalized variance function (GVF) using design factors for each variable, or through a successive difference replicate (SDR) method which uses replicate weights. These methods are explained in PUMS accuracy documentation.  

 

Data Source 3: Current Population Survey

The Current Population Survey is a monthly survey administered by the Census Bureau for the Bureau of Labor Market Statistics that includes rotating supplemental questions. The Census Bureau releases CPS microdata and publishes ready-to-use tables, both of which can be accessed through MDAT and IPUMS.

A set of CPS tables with national-level data on the foreign-born population from the Annual Social and Economic supplement (ASEC) was released in 2023. It includes characteristics broken out by nativity and U.S. citizenship status, year of entry to the United States, world region of birth (Asia, Europe, Latin America, and other areas), and generation (first-, second-, third-). 

To access state-level data, users can create custom tables through MDAT. Start by selecting a CPS dataset and vintage month and year. However, not all CPS datasets are available for all time periods. For example, the CPS Immigration/Emigration Supplement dataset is only available for August 2008. Some are available monthly while others are updated annually or biannually. Many of the supplements, including the ASEC, have a variable on the native country of the surveyed person and the native country of their mother and father, as well as a variable on citizenship, including non-citizens and foreign-born naturalized citizens. To find these variables, search for “citizenship” or “native country” in the variable label search bar. These variables can be combined with other CPS variables, including extensive economic variables, housing, demographics, migration, education, voting participation, and more to create detailed tables. 

The CPS is also subject to error, and instructions from the Bureau of Labor Statistics for calculating standard error for CPS data can be found here

 

Which dataset should you use? 

Each of these three datasets have unique strengths and challenges. The American Community Survey tables are easiest to access, have published margins of error, and include a variety of data tables down to a census tract level. However, the data tables do not always include the level of specificity that data users need to help with decision-making.

The ACS PUMS and the CPS allow for complete customization of tables, including the ability to recode variables, so that data users can answer very specific data questions with a single table. However, these datasets are harder to access than the ACS tables, margins of error must be calculated, and data is not always available for smaller geographies. ACS tables are a good place to start exploring a data question, and when more information is needed, PUMS and CPS data can fill in the gaps.