An Iterative Approach to Estimation with Multiple High-Dimensional Fixed Effects: Controlling Simultaneously for Patients, Providers and Counties

Monday, June 13, 2016: 10:15 AM
B26 (Stiteler Hall)

Author(s): Siyi Luo; Randall P Ellis; Wenjia Zhu

Discussant: Coady Wing

To estimate the effects of specific policy or treatment variables, controlling for multiple high-dimensional fixed effects is common in linear models of health care utilization. Traditional approaches for controlling for fixed effects are often infeasible with multiple fixed effects, each of high dimension. Additional challenges arise if sample sizes are large, data are unbalanced, and when instrumental variables and clustered standard error corrections are also needed. In this paper, we develop a new estimation algorithm, implemented in SAS, that accommodates all of these practical data challenges.

In contrast with most of the previous studies that absorb multiple fixed effects simultaneously, our algorithm sequentially absorbs fixed effects and repeats iterating until fixed effects are asymptotically eliminated. The implementation of our algorithm involves three steps: (1) Absorb fixed effects sequentially from all dependent and explanatory (including instrumental) variables; (2) estimate the model using the standardized variables; (3) repeat and calculate the maximum absolute value of the percentage difference between adjacent iterations among parameters of interest, and report estimates when the percentage difference falls below a pre-specified threshold.

We perform Monte Carlo simulations to evaluate the performance of our algorithm. A variety of datasets are considered with 95,000 to 100,000 observations and variations in the number of fixed effects, sample balance, correlations between fixed effects and control variables, extent of endogeneity, and interdependence between fixed effects. Standard errors are bootstrapped for inference. We show that our approach exactly matches Stata’s REGHDFE estimation results in all the models, which is itself identical to estimating with fixed effect dummies. The main advantage of our approach over these other algorithms is that SAS can accommodate extremely large datasets that do not require actively storing estimation data in memory, and the main disadvantage is that we do not directly provide parameter estimates of each fixed effect, only of the policy variables of interest.

The proposed algorithm is applied to real data from the US employer-based health insurance market to examine how health plan types affect health care utilization. Our analysis sample contains about 63 million observations from which we remove fixed effects for 1.4 million individuals, 3,000 counties, 150,000 distinct primary care doctors, and 47 monthly spells to predict plan type effects on monthly health care visit probabilities and spending. By simultaneously controlling for individual, doctor, county, time, and employer fixed effects, our identification comes from consumers’ movement between health plan types. Our iterative algorithm not only controls for these multiple, high-dimensional fixed effects, but also uses instrumental variables to control for endogenous plan choice and standard error corrections to adjust for clustering at the employer level. Our estimates show that the breadth of provider networks dominates cost sharing in influencing consumers’ decision to seek care. Specifically, narrow network plans (exclusive provider organizations, health maintenance organizations and point-of-service plans) reduce the probabilities of monthly provider contacts (by 36.6%, 18.7%, 11.9%) relative to preferred provider organization plans, while comprehensive plans and consumer-driven/high-deductible plans increase them (by 16.8%, 1.7%).