Identifying Endogenous Treatment Effects Using Latent Factor Models

Wednesday, June 15, 2016: 8:30 AM
419 (Fisher-Bennett Hall)

Author(s): Souvik Banerjee; Anirban Basu

Discussant: Edward C. Norton

In this work, we directly compare two causal inference methods, which to our knowledge have not been directly compared before, in the presence of multiple outcomes. The first is the traditional instrumental variables method, which is widely used in empirical research when one of the covariates of interest is endogenous. Using appropriate instrument(s) one can obtain consistent and unbiased estimates of the treatment effects, although the IV estimator is less efficient than the OLS estimator. The second is the latent factor model that, in the absence of an instrumental variable, can be used to obtain a causal treatment effect as long as the scale of the factor and the factor loading can be estimated through multiple outcome measurements.  We compare the estimation of the endogenous treatment effects through the bias, efficiency and coverage probability of alternative causal estimators by performing extensive set of simulations. We consider the case of continuous outcomes(y) with a binary endogenous treatment variable. The different model specifications include a naïve OLS model, an instrumental variables model, a model with a shared latent factor between the multiple outcomes equations (without an IV) and the treatment equation and another similar latent factor model with an IV. We present results for a sample of size 2000 with 1000 iterations and number of outcomes varying from 4 to 6.  As expected, the naïve OLS estimators are severely biased; whereas, the IV estimators are very close to their true values (coverage probability ~ 0.95) for each of the outcomes. The bias in case of the shared latent factor model without an IV is much lower compared to the naïve OLS model (coverage probability ~ 0.85); although, it is somewhat higher compared to that of the IV model, with the magnitude of the bias increasing with the number of the outcomes. For the same model, the standard errors are much larger than that of those of the IV model for all the outcomes. The model with the shared latent factor + IV performs quite well – the treatment effects are very close to their true values (coverage probability ~ 0.95) and the standard errors are lower compared to those obtained in the IV model for all the different outcomes. Our results suggest that a shared latent factor model produces much less bias compared to a naïve OLS, but is not as efficient as the IV model. Such an estimator would be the preferred model for causal inference in the absence of an IV, especially if one has large sample sizes. In addition, we find that a model with a shared latent factor + IV has comparable bias and coverage probability, but is more efficient than an IV model. This should be the preferred model when investigators have a series of outcomes on which treatment effects are estimated. We illustrate these intuition using an example of the effect of long-term care on various health outcomes for the near-elderly.