Risk adjustment or Risky Business: Measuring Provider Judgement and Skill in Personalized Medicine
Discussant: Maria A. Polyakova
Many conventional approaches to provider quality measurement implicitly assume that providers differ in the skill with which they treat patients, but agree on which patients should be treated. This framework is problematic when severity and treatment effects are correlated. These issues should be especially important for measuring surgical quality, where patients may be too healthy to need surgery or too sick to survive surgery. We address this issue using a novel combination of machine learning and econometrics. We employ a generalized random forest (Athey et al., 2017) to estimate patient-specific treatment effects. This approach uses conventional econometric identification strategies, but yields a completely nonparametric set of treatment effects while allowing for nonparametric risk adjustment. We use these measures to decompose quality into treatment skill – outcomes conditional upon patient characteristics and treatment choice – and clinical judgment – the rate at which patients receive appropriate treatment.
We study treatment decisions and outcomes for advanced heart failure. This condition affects more than 250,000 US patients and has a 60% one year survival rate under medical management. Clinical trials have found that ventricular assist devices (VADs) increase one-year survival by about 20 percentage points. Clinical trials populations are typically small and do not reflect the variety of patients actually receiving VAD treatment.
We find an average treatment effect on the treated is a 12 percentage point increase in one-year survival – well below effects observed in clinical trials. There is, however, a large mass of patients with a 20 percentage point treatment effect, but many patients receive modest benefits and more than 1 in 6 may be harmed by VAD treatment. Approximately 10% of judgement errors (e.g., treatment when the treatment effect is negative) result in excess one-year mortality.
The procedure appears to be particularly harmful for high-severity patients. The correlation coefficient between the treatment effect and severity is -0.46. Conventional risk adjustment methods attribute much of this risk patient severity. Poor judgment is, in effect, adjusted for as it were a form of observed severity. Correcting for judgment errors drastically changes the rank ordering of provider quality. We find that judgment errors are particularly prevalent for relatively new VAD centers (i.e., less than 2 years of operation) and low-volume VAD centers (i.e., those doing fewer than 20 procedures per year). Inference regarding individual providers is imprecise due to low data density at most centers.
Our current model is identified on observables. We are in the process of extending the generalized random forest software to estimate local nonlinear instrumental variables. We plan to identify treatment effects using distances, the timing of VAD center approvals, and variation in outlier payment rates as instruments for VAD utilization. We will also use more detailed clinical data from the INTERMACS registry to validate our identification strategy.