Using Machine Learning to Examine Adverse Selection vs. Cream Skimming
Discussant: Timothy J. Layton
At the same time, many insurers collect sophisticated data on individual health care decisions. Insurers use these data elements to augment medical claims and demographic information to more granularly predict patient spending. This detailed data, combined with “big data” and machine learning statistical techniques raises the possibility that insurers may advantageously identify select the healthiest consumers. If, through the uses of these data, insurers are more accurately able to forecast patient spending, insurers may be able to favorably select (“cream skim”) consumers.
However, little is known about the potential impacts of efforts to collect novel patient data and analyze it using recently developed statistical techniques. In addition, little is known about what types of data lead to larger increases predictive power compared to other data inputs. This study examines these questions using data from a nationally representative survey, the Medical Expenditure Panel Survey (MEPS). Within the MEPS data, we use several machine learning-based statistical models to identify the relative contribution to predictive power of information that is likely to be known just by patients and information that is known by the insurers, and potentially also by patients. We find that data from medical claims data is a much stronger predictor than data on patient demographics, socioeconomic status, or health status.
Our results imply that access to medical claims data improves risk prediction accuracy more than access to more nuanced data. In addition, with access to medical claims data, insurers can more accurately predict risk than can patients, which suggests that advantageous selection may be more present then adverse selection.