April 2, 2012

Sample Surveys and the 1940 Census

After a 72-year wait required by law, the National Archives has released individual records from the 1940 Census, opening a gold mine for people researching their family histories. But the 1940 Census also played a notable role in the history of census-taking: It helped usher in the modern era of sample surveys.

With the nation deep in the Great Depression, government officials planning for the 1940 Census hoped to expand the topics they asked about in order to guide federal policy-making in an era of expanded government. They wanted to add questions to the census about people’s incomes, migration histories and housing conditions.

But adding these new questions would have made the census form too lengthy, and been too costly. After lobbying by some of its younger statisticians, the Census Bureau’s solution was to turn to the new science of survey sampling. To make room for new topics, the Census Bureau decided for the first time to ask some questions of a sample of respondents, not of everyone. Those responses were extrapolated to the total population.

New Questions

The 1940 Census included 34 population questions and 31 housing questions for the general population, and an additional 16 questions (on six topics) that were asked of a sample of respondents. Some questions asked of the sample had been asked in previous counts, and others were asked for the first time in 1940. The housing questions were listed on a separate form, but in practice enumerators asked them along with the population questions.

Census Bureau officials knew that some of the new questions would be controversial, and that some people would find them intrusive. There has been debate since the first census, in 1790, about how much it is appropriate for the government to ask beyond a basic count.

One question that stoked controversy was about income, asking respondents the “amount of money wages or salary received (including commissions)”; a follow-up question asked whether the person received $50 or more in income from other sources. In fact, a U.S. senator from New Hampshire had lobbied unsuccessfully for the question to be dropped.

In 1940, not many Americans had previously been asked to give financial information to the federal government, according to historian Margo J. Anderson’s book, “The American Census: A Social History.” Only a minority of Americans then filed income tax forms; only 15 million forms were filed in 1940, when the national population was counted at 132 million.

Census officials knew that the strongest opposition to the income question came from high-income Americans, but the information they needed most was from low- and middle-income Americans. So census-takers were told to ask for actual wage or salary amounts only up to $5,000 a year; at the time, three-quarters of American families made no more than that. Incomes of $5,000 or more would be reported as “$5,000+.” The instructions to census-takers said: “Some persons who might otherwise be reluctant to report wages or salary would be quite willing to do so if they learn that the amount above $5,000 need not be specified.”

If people were reluctant to give their income information to the enumerator who knocked on their door, they could choose to fill out a card that would be sealed in an envelope and mailed to the Census Bureau. The question on income was toward the end of the questionnaire, “because if the enumerator got kicked out of a household when that question was asked, the interviewer would have already obtained the answers to the previous questions,” recalled Census Bureau official Edwin D. Goldfield, in an interview for a bureau oral history archive.

The housing questions amounted to the first national inventory of housing, according to Anderson. They included questions about the home’s water supply and toilet, if any. There also was a series of questions about the home’s value and cost. The housing forms were destroyed, and are not part of the 1940 Census release of records.

Sample Survey

Since the first U.S. census, the same set of questions had been asked of everyone. But in 1940, the desire for additional data coincided with improvements in survey methodology and theory that allowed the Census Bureau to add more questions without burdening the entire population. Statisticians in the 1930s had made great advances in designing and implementing sample surveys, in which a randomly selected subset of respondents supply data that are extrapolated to represent views of the entire population.

The importance of drawing a random sample was made clear after Literary Digest, a well-known magazine, conducted a straw poll of respondents whose names were obtained mainly from automobile registration lists and telephone books. The poll, which had a gigantic response of two million postcards (about 1,000 is considered an adequate sample today), indicated a landslide victory for Republican Alf Landon over Democrat Franklin D. Roosevelt in the 1936 presidential election. Although factors other than the flawed sample also played a role in that failed forecast, the Literary Digest debacle was a strong force in the rise of surveys based on scientific probability samples, in which any American adult has a known chance of being asked for a response. Other polls taken in 1936 that were based on random samples were much more accurate than the Literary Digest poll.

Some long-time Census Bureau officials had resisted incorporating scientific sampling into the decennial census, believing it would “downgrade the validity of census information, because you had to say that this is based on a sample,” recalled Ross Eckler, a former Census Bureau director who was interviewed for a Census Bureau oral history project.

Armed with information from some smaller government sample surveys, a younger generation of statisticians persuaded their skeptical elders at the Census Bureau that such an approach should be part of the decennial census. The introduction of sampling did not cause any notable public controversy or challenge from members of Congress, according to Eckler and other sources.

In the 1940 Census, the sampled population consisted of every 20th person interviewed by any given enumerator. In practice, enumerators filled out forms that had space for information from 40 people on each side, and were told to ask the supplemental questions of people whose names fell on certain designated lines. According to the instructions for enumerators, the questions should be asked about anyone whose name was listed on a certain line, “whether this be the head, his wife, a son or daughter, an infant, a lodger, or any other member of the household.”

The sample questions included place of birth of the person’s mother and father; “mother tongue” in the household during early childhood; three questions about military service and veteran or veteran-family status; three questions about Social Security receipt; and questions about occupation, industry and class of worker. Women who were or had been married were asked whether they had been married more than once, how old they were when first married and the number of children they had ever had.

Sampling Since 1940

Every decennial census from 1950 to 2000 included widened use of sampling to ask additional questions of part of the population, which eventually were asked on a separate form known as the “long form.” Since 2000, the American Community Survey, a sample survey, has asked the same questions on a continuous basis that the census had asked once a decade on the long form. The Census Bureau also uses sample surveys to evaluate the quality of the census and to test new questions and survey methods.

The techniques for drawing a sample, and extrapolating results, have become increasingly complex. Over the years, the Census Bureau moved from a person-based sample to a household-based sample. Instead of being drawn from people interviewed by one enumerator, samples were drawn from lists of addresses.

The reliability of sample surveys is now taken for granted within the Census Bureau, and wider research community, where sample surveys are the basis for measuring public opinion, political attitudes, employment levels and other information.

Some uses still can generate controversy, however. Since 1950, the Census Bureau has taken a post-enumeration sample survey to check the quality of the full count. In 2000, the Census Bureau planned to use a sample survey as the basis not only for checking the accuracy of its original count, but also for amending that official count if it was found that the post-census survey data would improve the quality of the enumeration. The agency planned to conduct a post-enumeration survey, match the results with actual census records and apply a statistical technique known as dual-systems estimation to correct flaws in the original enumeration.

However, the application of survey sampling to produce the official counts used to apportion congressional seats among the states met opposition, mainly from Republicans who expressed concern that results could be manipulated for partisan purposes. The use of survey sampling for apportionment purposes was successfully challenged in the U.S. Supreme Court before the 2000 Census was taken.

The Census Bureau did take a post-enumeration survey in 2000, and considered trying to use its results for other purposes, such as producing data for redistricting within states or allocating federal funds among states and localities. But problems were discovered with the survey results, so they were not used for other purposes.

The bureau did conduct a post-enumeration survey after the 2010 Census, called the Census Coverage Measurement program. It is intended to produce measurements of undercount, duplicate count and other error, but not to amend the count. Results are expected sometime this year.