Now trending: Coping with non-parallel trends in difference-in-differences analysis

Tuesday, June 14, 2016: 3:00 PM
419 (Fisher-Bennett Hall)

Author(s): Andy Ryan; James F. Burgess

Discussant: Matt Sutton

Difference-in-differences (DID) methods are extremely popular in health services and health policy research.  DID methods rely on the “parallel trends” for identification of causal effects. To date, researchers have had little guidance about how to estimate causal effects using DID when the parallel trends assumption is violated.

We performed a Monte Carlo simulation experiment to address this question. The simulation experiment was designed to test the performance of alternative DID estimators when estimating the effect of an imaginary policy on quality. Hospitals were assigned to treatment under three different scenarios: 1) random assignment; 2) assignment in which pre-intervention levels were positively correlated with outcomes; 3) assignment in which pre-intervention levels were positively correlated with trends in outcomes. For scenarios in which assignment was based on pre-intervention levels and trends, assignment probabilities were based on the cumulative density function of the respective empirical distributions. After assignment, alternative program effects were assigned, ranging from no effect, a “small effect” (2%), or a “large” effect (10%). We then estimated alternative DID models that varied with respect to the choice of treating trends. Data were available for thee pre-intervention (years 1, 2, and 3) and three post-intervention periods.

Our simulation experiment was run for 200 iterations using each of the three assignment scenarios using each of the datasets (200 iterations ∙ 3 scenarios ∙ 2 data sets = 1,200 simulation iterations). For each simulation iteration, we captured the rate of false rejection (i.e., type II error), the mean absolute error, and mean-squared error between estimated program effects and their true value.

Our findings support two main conclusions when the parallel trends assumption is violated: 1) the use of propensity score matching estimators results in much lower estimator bias and type II error than specifications that adjust for differences in pre-intervention trends; 2) the performance of estimators that adjust for differences in pre-intervention trends is worse than the standard DID estimator. Supplemental analysis examining DID estimators that controlled for group-specific trends found that these estimators were very sensitive to differences in pre-intervention trends between treatment and comparison groups. When treatment and comparison groups have divergent trends in the pre-intervention period, instead of maintaining these divergent trends, they are more likely to converge in the post-intervention period. As a result, DID estimators that assume that divergent pre-intervention trends will continue in the post-intervention period tended to produce incorrect counterfactuals.

We use these results to develop practical advice for researchers who seek to apply DID methods to problems in health services research and health policy.