Menu

Using Machine Learning to Examine Adverse Selection vs. Cream Skimming

Tuesday, June 25, 2019: 4:30 PM
Hoover - Mezzanine Level (Marriott Wardman Park Hotel)

Presenter: Christopher Whaley

Co-Author: Armando Franco

Discussant: Timothy J. Layton


The wide variation in costs makes accurately predicting risk an important aspect of health insurance markets. One challenge with predicting risk is adverse selection by patients. Patients may know more about their health care utilization than insurers. In an effort to avoid charging a higher premium, patients may not disclose their expected utilization until they are enrolled in a plan, and already locked into a premium. If enough individuals act on this hidden information to select insurance plans, or decide when to purchase insurance, health insurance markets may unravel.

At the same time, many insurers collect sophisticated data on individual health care decisions. Insurers use these data elements to augment medical claims and demographic information to more granularly predict patient spending. This detailed data, combined with “big data” and machine learning statistical techniques raises the possibility that insurers may advantageously identify select the healthiest consumers. If, through the uses of these data, insurers are more accurately able to forecast patient spending, insurers may be able to favorably select (“cream skim”) consumers.

However, little is known about the potential impacts of efforts to collect novel patient data and analyze it using recently developed statistical techniques. In addition, little is known about what types of data lead to larger increases predictive power compared to other data inputs. This study examines these questions using data from a nationally representative survey, the Medical Expenditure Panel Survey (MEPS). Within the MEPS data, we use several machine learning-based statistical models to identify the relative contribution to predictive power of information that is likely to be known just by patients and information that is known by the insurers, and potentially also by patients. We find that data from medical claims data is a much stronger predictor than data on patient demographics, socioeconomic status, or health status.

Our results imply that access to medical claims data improves risk prediction accuracy more than access to more nuanced data. In addition, with access to medical claims data, insurers can more accurately predict risk than can patients, which suggests that advantageous selection may be more present then adverse selection.