Evaluating pay-for-performance programs in health care: a comparison of synthetic control and difference-in-differences approaches

Kreif, Noemi

Policy-makers worldwide are introducing pay-for-performance (P4P) schemes in health care without adequate evaluation. In 2008 the Advancing Quality (AQ) initiative, based on the US Hospital Quality Incentive Demonstration, was introduced for all hospitals in the North West region of England. Published evaluations of the AQ program used difference-in-differences (DiD) regression to compare 30-day risk-adjusted hospital mortality 18 months before and after the program’s introduction in the North West and the rest of England for patients admitted with three of the incentivised conditions: pneumonia, heart failure and acute myocardial infarction. They concluded that the AQ program led to a significant reduction in risk-adjusted mortality for patients admitted with pneumonia, and was cost-effective. However, this approach assumed that, without the AQ scheme, the two groups of hospitals would have followed parallel trends in risk-adjusted mortality.

We contrast DiD regression with the synthetic control method, developed by Abadie and colleagues, which generalises DiD by allowing heterogeneous responses over time to unobserved common factors. A synthetic control group is defined as a weighted average of control units, with weights chosen to minimise differences in changes over time in the pre-intervention outcomes and covariates between the comparison groups. This approach has not previously been considered in evaluating P4P programs for health care providers. This setting requires that the method, originally developed for evaluating treatment effects for a single aggregated treated unit, is extended to multiple treated units (e.g hospitals). This extension can provide policy makers with evidence on the effects of P4P for different subgroups of providers. This paper estimates the effects of the AQ scheme on risk-adjusted mortality overall, and according to hospital type (teaching, large, medium, small), for patients admitted with pneumonia.

For each hospital in the North West (n=23), we generated a synthetic control by weighting control hospitals (n=122), to balance the patterns of risk-adjusted mortality prior to the introduction of AQ. The algorithm also used information on pre-intervention covariates such as hospital quality, and aggregated-level case-mix variables. We estimated the effect of the AQ program by contrasting mortality between North West and synthetic control groups for the 18 month period after AQ was introduced. Uncertainty in the quality of the synthetic controls was assessed with placebo tests undertaken at regional and subgroup level. These tests compared the magnitude of the estimated treatment effects, with the corresponding effects estimated from applying the same procedure for control hospitals only.

The synthetic control groups had similar pre-intervention trajectories for risk-adjusted mortality compared with the North West hospitals. The synthetic control approach reported that the effect of the AQ scheme on risk-adjusted hospital mortality at 30 days was small and not statistically significant, both overall (average difference of -0.2, p=0.98), and for each subgroup.

The synthetic control method is an attractive approach for evaluating P4P schemes. By minimising differences between the comparison groups in pre-intervention outcomes, the synthetic control method provides estimates of program impacts that are more robust to time-varying unobservable heterogeneity between the intervention group and the potential controls, than traditional DiD approaches.

Health & Healthcare in America: From Economics to Policy

June 22 - 25, 2014

Evaluating pay-for-performance programs in health care: a comparison of synthetic control and difference-in-differences approaches

American Society of Health Economists (ASHEcon)