January 30, 2019

Data and methodology

This is the data and methodology behind the data essay, “An early look at the 2020 electorate.”

Estimates for 2000-2016 are from Census Bureau microdata provided through IPUMS-USA. In 2000, the decennial census 5% sample was used. In all other years (2008, 2012 and 2016), the 1-year American Community Survey sample was used. Projections for 2020 were created using the Census Bureau’s 2017 National Population Projections as a base. Additional calculations used to create adjustments were made using Census Bureau vintage 2016 National Population Estimates for the U.S. resident population as of July 1, 2016.

The data sources used in this analysis use different definitions for race and Hispanic origin. To ensure comparability between the 2000-2016 trend and the 2020 projections, several adjustments were made to the 2020 resident population projections to bring the definitions of race and Hispanic origin used in the projections in line with those used in the decennial census and ACS microdata. Additionally, assumptions were made about naturalization trends to project naturalized foreign-born citizens eligible to vote.

The decennial census and ACS allow respondents to identify as “some other race,” though this category does not appear in the population projections. Instead, the Census Bureau creates modified race data to assign a race to respondents who identify as some other race on the decennial census or ACS. To maintain comparability, we calculate an adjustment based on the calculated population totals for each of the major race and Hispanic origin groups (other races and multiracial are excluded) using the 2016 population estimates.

In the projections, data are provided for blacks and Asians/Pacific Islanders without distinction for Hispanic origin. Another adjustment is applied to account for the portions of these populations that identify a Hispanic origin for 10-year age groups using the 2016 population estimates.

For our 2020 projections, the racial and ethnic category “Non-Hispanic other/Multiracial” is calculated as the residual of the total population minus the four major racial and ethnic groups.

Applying the adjustments described above to the 2020 population projections allows creation of a consistent trend for the total resident population by sex, age, race, Hispanic origin and nativity. Projections of the electorate require additional assumptions about who will be eligible to vote. For the decennial census and ACS data, this is calculated as all U.S.-born and naturalized citizens ages 18 and older. We forecast naturalization rates in 2020 of 5-year age groups for each racial and ethnic group separately using historical data from 2008, 2012 and 2016 in a simple linear model. These adjustments are applied to the resident population dataset and all minors are dropped out to create the electorate dataset.

The data do not include U.S. citizens living abroad nor members of the armed forces stationed abroad, both relatively small populations that are potentially eligible to vote. Additionally, these data only represent persons eligible to vote, not those who are registered to do so nor those who will vote. Significant changes in naturalization rates of foreign-born persons living in the U.S. between 2016 and 2020 would also lead to deviations from these projections.