Using Machine Learning to Model Information Complexity in Risk Adjustment
We use these techniques to explore three issues related to clinical risk adjustment. First, we compare the accuracy of machine learning techniques to traditional algorithms for both in- and out-of-sample risk adjustment. Second, we compare the relative performance of these models over clinical conditions with more or less information complexity. Acute surgical conditions or acute myocardial infarctions (AMIs) could, for example, have a high-risk mortality but outcomes may be dependent on a fairly concise set of clinical information. Other conditions, such as pneumonia, that require medical management may depend on a broader set of clinical information. Modeling these conditions may also require more flexible specifications. Machine learning techniques may be especially beneficial for conditions with information complexity. Third, we use the risk adjustment algorithms in a model of hospital quality with endogenous choice. We measure the effect of Critical Access Hospitals (CAHs) on quality while using machine learning techniques to allow for heterogeneous treatment effects. The models are identified using differential distances between the closest CAH and non-CAH hospitals.
These models are estimated using Medicare Fee For Service (FFS) administrative claims data. These data describe all hospitalizations and all ER visits for the 2011 FFS Medicare population. We compare traditional risk adjustment algorithms to a variety of machine learning techniques. We first explore penalized regressions such as the least absolute shrinkage and selection operator (Lasso). These models employ a penalty function to select relevant covariates from a high-dimensional data vector. We then explore a variety of regression-tree based models such as boosting and bagging and random forests. Tree-based models may be more adept at handling non-linear inputs. The hospital selection and outcome models will be estimated by recursive bivariate probit using a double-selection process to correct for bias (Belloni et al., 2012 and 2014).
Preliminary results suggest that machine learning techniques provide more accurate risk adjustment for all clinical conditions. However, the magnitudes of the gains are small (e.g., a 1 to 2 percentage point improvement) for conditions such as AMI but substantially higher (e.g., about 8 percentage points) for conditions such as pneumonia.