Practical Considerations for Estimating Treatment Effects with Machine Learning

Tuesday, June 25, 2019: 10:00 AM
Madison A (Marriott Wardman Park Hotel)

Presenter: Kenneth John McConnell

Co-Author: Stephan Lindner

Discussant: Eric Roberts

Background: Machine Learning (ML) applications have a growing impact on the field of health economics. Until recently, most ML applications have focused on problems of prediction. However, recent developments in ML have put forth methodologies that expand the application of ML beyond predictive models into the realm of statistical inference, particularly in estimating treatment effects in cross-sectional data. This paper compares the performance of several ML-based approaches for estimating treatment effects under exogeneity, including Targeted Maximum Likelihood Estimation, Bayesian Additive Regression Trees, Causal Random Forests, Double Machine Learning, and Bayesian Causal Forests

Study Design: We performed Monte Carlo simulations to assess the performance of different ML estimators. We constructed a flexible data generating process that allowed for systematically varying the amount of confounding between covariates, outcome, and treatment; the number of observations; the number of correlated covariates; and the number of noise covariates. We assessed estimator bias, root mean squared error, mean absolute deviation, and 95% CI coverage rates. We also illustrate the performance of alternative methods using data on right heart catheterization.

Results: In Monte Carlo studies, ML-based estimators generated estimates with smaller bias than traditional regression approaches, and demonstrated substantial (40%-98%) bias reduction in some scenarios. With a smaller number of covariates, Bayesian Causal Forests was often a top performer, while Double Machine Learning fared better with large (>150) sets of covariates.

Conclusions: New ML-based estimators offer researchers opportunities to potentially improve the estimation of treatment effects by incorporating ML algorithms, allowing for the inclusion of many covariates and automating the search for nonlinearities and interactions among variables. In many cases, the advantages (in terms of bias reduction) are substantial in comparison to more traditional statistical approaches. We provide guidance and sample code for researchers interested in implementing these tools in their own empirical work. Overall, the ability to expand the tools of ML into the area of treatment effects and statistical inference should be relevant to a large range of health economics research.