A Sequence of Two Studies to Learn & Test Heterogeneous Treatment Sub-groups: Effects of Cost Exposure on Use of Outpatient Care

Haviland, Amelia

There is a strong interest in estimating how the magnitude of treatment effects of an intervention vary across sub-groups of the population of interest. Without a priori knowledge of the sub-groups, one option is to identify them in experimental settings. Drawing noise-free inference about heterogeneous treatment effects using experimental data requires cross-validation which is problematic when the sample size is not large. In our study, we propose a two-stage approach to first propose and then test heterogeneous treatment effects of an intervention. In Stage 1, we use a large observational dataset to learn sub-groups with the most distinctive treatment-outcome relationships ('high/low-impact sub-groups'). We adopt a model-based recursive partitioning approach to propose the high/low impact sub-groups, and validate them by using sample-splitting. While the first stage rules out noise, there is potential bias in our estimated heterogeneous treatment effects. Stage 2 uses an experimental design, and here we classify our sample units based on sub-groups learned in Stage 1. We then estimate treatment effects within each of the groups, thereby testing the causal hypotheses proposed in Stage 1. Using patient claims data from the NBER MarketScan database, we apply our approach to estimate heterogeneous effects of a switch to a high-deductible health insurance plan on use of outpatient care by non-Medicare patients with a common chronic condition (high cholesterol). We also extend our Stage 1 approach to incorporate a non-parametric machine learning method of estimating the sub-groups in a continuous treatment variable setting.

Schedule

Additional Information

A Sequence of Two Studies to Learn & Test Heterogeneous Treatment Sub-groups: Effects of Cost Exposure on Use of Outpatient Care