Proposal preview

New research using linked census data: Scandinavia and the U.S.

In recent years there has been a revolution in data digitization and record linkage. Massive data sources have become available thanks to projects such as IPUMS and NAPP. More recently increased efforts have been made to make full count data public through the digitalization, harmonization and dissemination of historical census data from all over the world. This development has also led to major advances in record linkage, making it possible to follow individuals between censuses and individual-level birth, migration and death records. These new data opens completely new frontiers for research in economic history and historical demography. It is now possible to study the complete life course of individuals in different periods and geographic contexts, following individuals both within and between countries. The time has come to move beyond methodological issues and to discuss actual research. The aim of this session is to showcase the potential of these data for advancing knowledge on major issues in economic history. More specifically the session will present research on demographic outcomes and social mobility, using linked records from full-count individual level census data. The papers cover different contexts and different research questions, but are all based on new linked individual-level data.

Organizer(s)

  • Martin Dribe Lund University martin.dribe@ekh.lu.se Sweden
  • Björn Eriksson Lund University bjorn.eriksson@ekh.lu.se Sweden

Session members

  • Anna Aizer, Brown University
  • Martin Dribe , Lund University
  • Shari Eli, University of Toronto
  • Björn Eriksson, Lund University
  • Katherine Eriksson, University of California, Davis
  • Jonas Helgertz, Lund University and University of Minnesota
  • Adriana Lleras-Muney, University of California, Los Angeles
  • Ran Abramitzky, Stanford University
  • Evan Roberts, University of Minnesota
  • Andrew Halpern-Smith, Indiana University
  • Robert Warren, University of Minnesota
  • Hui Ren Tan, Boston University
  • James Feigenbaum, Boston University

Discussant(s)

  • Björn Eriksson Lund University
  • Lionel Kesztenbaum Paris School of Economics

Papers

Panel abstract

In recent years there has been a revolution in data digitization and record linkage. Massive data sources have become available thanks to projects such as IPUMS and NAPP. More recently efforts have been made to make full count data public through the digitalization, harmonization and dissemination of historical census data from all over the world. This development has led to major advances in record linkage, making it possible to follow individuals between censuses and individual-level birth, migration and death records. These new data opens completely new research frontiers in economic history and historical demography. It is now possible to study the complete life course of individuals both within and between. The aim of this session is to showcase the potential of these data for advancing knowledge on major issues in economic history.

1st half

Shifting the Landscape of Mobility: The Role of Local Labor Markets, Human Capital, and Societal Change

Hui Ren Tan

Intergenerational mobility varies substantially across the U.S. today, with the non-industrial Midwest exhibiting some of highest rates of upward mobility. Has the landscape of mobility always been this way? Using a large historical linked sample, I show that the geography of mobility was significantly different in the early 20th century, with the coastal areas and industrial Midwest providing the most opportunities for upward mobility. I provide evidence that these historical patterns were driven by the types of jobs available in local labor markets rather than the childhood environment that one was exposed to, in contrast to the present. Over time, childhood exposure effects grew in relative importance as labor market structures converged, the returns to human capital increased, and as societal problems evolved. This, in turn, shifted the landscape of mobility in favor of places with more conducive environments for childhood development.

Intergenerational mobility varies substantially across the U.S. today, with the non-industrial Midwest exhibiting some of highest rates of upward mobility. Has the landscape of mobility always been this way? Using a large historical linked sample, I show that the geography of mobility was significantly different in the early 20th century, with the coastal areas and industrial Midwest providing the most opportunities for upward mobility. I provide evidence that these historical patterns were driven by the types of jobs available in local labor markets rather than the childhood environment that one was exposed to, in contrast to the present. Over time, childhood exposure effects grew in relative importance as labor market structures converged, the returns to human capital increased, and as societal problems evolved. This, in turn, shifted the landscape of mobility in favor of places with more conducive environments for childhood development.

The Effects of Education on Mortality: Evidence from a Large Representative Sample of American Twins

Robert Warren, Andrew Halpern-Manners, Evan Roberts, Jonas Helgertz

We are producing U.S.-based estimates of the effects of education on mortality (and any variation in those effects that exists across sub-groups), using a large and representative panel of twins drawn from linked complete-count Census records. For comparison purposes, and to shed additional light on the specific roles that neighborhood, family, and genetic factors play in confounding associations between education and mortality, we will also produce parallel estimates of the education-mortality relationship using data on (1) non-twin pairs who lived in different neighborhoods during childhood; (2) non-twin pairs who shared the same neighborhood growing up; and (3) non-twin siblings who shared the same family environment but whose genetic endowments vary to a greater degree. Our findings will have major implications for our understanding of educational gradients in mortality and should provide a useful foundation for future work examining the etiology of human survival.

We are producing U.S.-based estimates of the effects of education on mortality (and any variation in those effects that exists across sub-groups), using a large and representative panel of twins drawn from linked complete-count Census records. For comparison purposes, and to shed additional light on the specific roles that neighborhood, family, and genetic factors play in confounding associations between education and mortality, we will also produce parallel estimates of the education-mortality relationship using data on (1) non-twin pairs who lived in different neighborhoods during childhood; (2) non-twin pairs who shared the same neighborhood growing up; and (3) non-twin siblings who shared the same family environment but whose genetic endowments vary to a greater degree. Our findings will have major implications for our understanding of educational gradients in mortality and should provide a useful foundation for future work examining the etiology of human survival.

Holding Out for Mr. Right: Women's Income, Marital Status and Child Well-Being

Shari Eli, Anna Aizer, Adriana Lleras-Muney

The objective of welfare provision is to improve child well-being, but welfare receipt can disincentivize mothers from remarrying, which may negatively affect children in the long run. Using administrative data for over 10,000 women from the first welfare program in the US - the Mothers' Pension Program - we find that welfare receipt does result in delayed remarriage, but does not affect lifetime remarriage rates. Importantly, when women on welfare do marry, they marry men of higher socio-economic and health status. This is consistent with a model of search in the marriage market in which welfare receipt allows women to search for longer in order to match with higher-status men, thereby improving children’s long run outcomes. These results also highlight the importance of paternal inputs into children’s development.

The objective of welfare provision is to improve child well-being, but welfare receipt can disincentivize mothers from remarrying, which may negatively affect children in the long run. Using administrative data for over 10,000 women from the first welfare program in the US - the Mothers' Pension Program - we find that welfare receipt does result in delayed remarriage, but does not affect lifetime remarriage rates. Importantly, when women on welfare do marry, they marry men of higher socio-economic and health status. This is consistent with a model of search in the marriage market in which welfare receipt allows women to search for longer in order to match with higher-status men, thereby improving children’s long run outcomes. These results also highlight the importance of paternal inputs into children’s development.

2nd half

Selection among Swedish migrants to America during the era of mass migration

Martin Dribe, Björn Eriksson

Between 1850 and 1930 over 30 million people left Europe for North America, with a majority ending up in the United States. In relative terms Sweden was one of the most important sending countries. In total 1.1 million Swedes left for the U.S., out of a population of about 5 million. The paper examines the selection mechanisms of migration from Sweden to the U.S. during the age of mass migration. We rely on digitized complete censuses with individual-level data for the complete Swedish population in 1880, 1890, 1900 and 1910. We complement the censuses with Swedish emigration registers, which enables us to accurately identify emigrants to the U.S. To address the research question regarding selection we study the individual, temporal and contextual determinants of migration taking into account falling travel costs and the development of the Swedish and U.S. economies during the age of mass migration.

Between 1850 and 1930 over 30 million people left Europe for North America, with a majority ending up in the United States. In relative terms Sweden was one of the most important sending countries. In total 1.1 million Swedes left for the U.S., out of a population of about 5 million. The paper examines the selection mechanisms of migration from Sweden to the U.S. during the age of mass migration. We rely on digitized complete censuses with individual-level data for the complete Swedish population in 1880, 1890, 1900 and 1910. We complement the censuses with Swedish emigration registers, which enables us to accurately identify emigrants to the U.S. To address the research question regarding selection we study the individual, temporal and contextual determinants of migration taking into account falling travel costs and the development of the Swedish and U.S. economies during the age of mass migration.

The Role of Ethnic Enclaves in Immigrant Assimilation: Evidence from Scandinavian Migrants during the Age of Mass Migration

Katherine Eriksson

This paper examines the economic outcomes of Norwegian immigrants in 1910 and 1920, the later part of the Age of Mass Migration. Using different identification strategies, including county fixed effects and an instrumental variables strategy based on chain migration, I consistently find that Norwegians living in larger enclaves in the United States had lower occupational earnings, were more likely to be in farming occupations, and were less likely to be in white-collar occupations. Results are robust to matching method and choice of occupational score. This earnings disadvantage is passed on to the second generation.

This paper examines the economic outcomes of Norwegian immigrants in 1910 and 1920, the later part of the Age of Mass Migration. Using different identification strategies, including county fixed effects and an instrumental variables strategy based on chain migration, I consistently find that Norwegians living in larger enclaves in the United States had lower occupational earnings, were more likely to be in farming occupations, and were less likely to be in white-collar occupations. Results are robust to matching method and choice of occupational score. This earnings disadvantage is passed on to the second generation.

Best Practices for Automated Linking Using Historical Data: A Progress Report

Ran Abramitzky, Leah Boustan, Katherine Eriksson, James Feigenbaum, Santiago Perez

The recent digitization of complete count census data is an extraordinary opportunity for economic historians to create large individual-level longitudinal datasets by linking records from one census to another or from other sources to the census. However, linking is complicated by a steep learning curve and the existence of many different methods. More recently, questions have been raised about how algorithmic record linking compares with hand linking. In this paper, we develop a practitioner’s guide to record linkage. Our goals are twofold. First, we discuss a series of choices about how to prepare and analyze the data, including the value of adding additional linking variables, of name standardization and data cleaning, of blocking the data, and of string distance metrics and sound encodings. Second, we perform a series of comparisons across various algorithm (and hand) methods, including new methods using machine learning and Bayesian algorithms.

The recent digitization of complete count census data is an extraordinary opportunity for economic historians to create large individual-level longitudinal datasets by linking records from one census to another or from other sources to the census. However, linking is complicated by a steep learning curve and the existence of many different methods. More recently, questions have been raised about how algorithmic record linking compares with hand linking. In this paper, we develop a practitioner’s guide to record linkage. Our goals are twofold. First, we discuss a series of choices about how to prepare and analyze the data, including the value of adding additional linking variables, of name standardization and data cleaning, of blocking the data, and of string distance metrics and sound encodings. Second, we perform a series of comparisons across various algorithm (and hand) methods, including new methods using machine learning and Bayesian algorithms.