Using Natural Disasters to Test the Performance of Machine Learning Algorithms in Predicting Hospital Demand
Discussant: Christopher J. Garmon
This ability of firms to collect large amounts of information about individual consumers has led to different approaches to prediction. In one approach, often labeled machine learning, analysts predict choices using algorithms that use all available variables, while remaining agnostic about which variables are most likely to be good predictors. This model agnostic approach to prediction stands in sharp contrast to the approach used by health economists, who will typically estimate a multinomial logit choice model, parameterized by variables that the researcher thinks are most relevant ex-ante.
In this paper, we use a set of natural experiments to compare the relative performance of machine learning algorithms to frequently used econometric models after a major structural change in the choice environment. The setting for our experiments are local hospital markets, which are “shocked” by natural disasters that closed one or more hospitals but left the majority of the surrounding area undisturbed. These natural disasters exogenously altered consumers' choice sets, creating a benchmark against which to assess the performance of different predictive models.
We use the pre-disaster data to estimate consumers' preferences for different hospital characteristics. Then, we predict consumer decisions after the disaster has changed consumers' choice sets. By comparing the different models' predictions to actual post-disaster effects, we are able to evaluate their performance in a setting where the choice environment has changed.
We make no claims to exhaustively consider how all possible machine learning models perform. Instead, we focus on a set of approaches that are considered highly accurate, and have already been implemented within existing software packages multinomial choice problems. In particular, we examine decision trees, random forests, gradient boosted trees, and elastic net regularized conditional logit models.
Across all of our natural experiments, we find that the gradient boosting tree model does
particularly well, and is usually one of the best models at predicting individual’s choices post disaster. When we employ a model combination approach to compare the performance of all the models, the gradient boosting model receives about 60% of the model weight on average.
We do find, however, that machine learning models do not always dominate parametric models. For example, the model performance of machine learning models becomes worse for patients who were more likely to have gone to the destroyed hospital, and so were more likely to have to change their preferred hospital post-disaster. In addition, parametric logit models perform better at individual prediction for the service area with the largest share of the destroyed hospital. These results indicate that parametric logit models may still have an important role to play in counterfactual prediction when there are large changes in the environment or comparatively little data on which to train the model.