Using Machine Learning to Predict Opioid Involvement in the U.S. Death Record

Hill, Elaine

Current estimates suggest that ~350,000 people died due to prescription opioid and heroin overdose between 2000 and 2016 (Henry J. Kaiser Family Foundation 2018). The opioid epidemic has led to measurable impacts on U.S. life expectancy (Case and Deaton 2015). Researchers have speculated that the epidemic has been caused by multiple factors, including a more liberal approach to pain management, introduced in the early 1990s (Baker 2017); increased marketing and promotion of prescription opioids for pain management, especially by Purdue Pharmaceuticals, the makers of OxyContin (Van Zee 2009); and rising economic malaise and insecurity (Case and Deaton 2015). In a rare sign of bipartisanship, both parties in Congress support federal action on the opioid epidemic (Blendon et al. 2016) and in October of 2017, President Donald Trump classified the opioid epidemic as a national emergency (The White House 2017).

Recent research (e.g., Buchanich et al. 2018; Ruhm 2017; Ruhm 2018) suggests that the size of the opioid epidemic has likely been underestimated. The drug involved in an overdose is not always specified in the death records compiled in the National Vital Statistics System (NVSS) from the National Center of Health Statistics (NCHS). This is not a trivial issue; “other and unspecified drugs” (ICD-10 code T50.9) are implicated in ~25% of all drug overdoses in the death record from 2000 to 2016. Additionally, drug overdoses may be classified as deaths from causes unrelated to drug use (Ruhm 2018). The drivers of undercounting of opioid-caused deaths are unclear but are potentially related to inadequate training for coroners without medical experience and the high cost of a toxicology report.

The extent to which the opioid drug epidemic has been underestimated due to either of these issues is an open question. Using data from drug overdoses with a classified drug, Ruhm (2017) and Ruhm (2018) generate year-specific models of opioid-related drug involvement as functions of demographic characteristics, location, and time-of-death; these models are then used to predict specific drug involvement for those overdoses where no drug is specified. However, neither Ruhm (2017) and Ruhm (2018) provide information on the model’s predictive capability, nor do they project opioid involvement in non-overdose classified deaths.

In this paper, we use machine learning techniques, such as random forests and various boosting algorithms, to predict opioid involvement in both (1) unclassified-drug overdoses and (2) non-overdose classified deaths. We follow Ruhm (2017) and Ruhm (2018) by predicting opioid involvement using a wide assortment of socio-economic and demographic characteristics, death timing, and information, while also adding a flexible assortment of additional causes of death that may be associated with opioid use. To our knowledge, this is the first study to use machine learning to enhance our understanding of the scale of the opioid epidemic. This information is critical for policy-making. Without a full understanding of the extent of the opioid epidemic, policy-makers are unlikely to appreciate the potential benefits of policies designed to reduce opioid mortality (Council of Economic Advisors 2017).

Schedule

Additional Information

Using Machine Learning to Predict Opioid Involvement in the U.S. Death Record