Missing Data, Imputation Accuracy, and Endogeneity

Tuesday, June 14, 2016: 10:55 AM
B26 (Stiteler Hall)

Author(s): Ian K McDonough; Daniel Millimet

Discussant: Charles Courtemanche

Missing data on covariates is a situation often confronted by empirical researchers, particularly when using individual-level survey data.  If the data are not missing at random, then it is well known that estimation using only observations with complete data results in sample selection bias.  In this case, it is common for researchers to impute the missing data.  Many such imputation techniques have been proposed in the literature.  However, there is little guidance available for choosing among existing techniques.  In this paper, we build on the insights in Millimet (2015) and investigate the practical performance of a host of imputation techniques.  Our conjecture is that, regardless of the imputation technique used, the resulting data contain measurement error (unless the imputation is perfectly accurate) and therefore Instrumental Variable (IV) estimation must be used to obtain consistent estimates.  Because the finite sample bias of IV (in absolute value) is not monotonically decreasing in the degree of measurement accuracy, the most accurate imputation method is not necessarily the method that minimizes the bias.  Instead, we recommend that researchers choose an imputation method to maximize the first-stage strength of the instrument(s), even if this entails using a less accurate imputation method.  We illustrate this recommendation via simulations as well as with an application related to the causal effect of birthweight on subsequent child outcomes using the Early Childhood Longitudinal Survey – Kindergarten Cohort (ECLS-K), where birthweight is missing in more than 11% of the sample.  Specifically, we explore alternative imputation techniques and instrument for (imputed) birthweight using state-level SNAP rules which have been shown to impact SNAP participation (Meyerhoefer & Pylypchuk 2008) which in turn reduces the likelihood of low-income expectant mothers of gaining insufficient weight (Baum 2012).