Interpretation, Estimation, and Inference for Semilogarithmic Difference-in-Differences Models
Discussant: Andrew Goodman-Bacon
With the rise of Difference-in-Differences (DD) style models in the decades since Halvorsen and Palmquist (1980), the issue of interpretation of dummy variables in semilogarithmic models again merits attention. In every case I have seen, analysts using these models interpret the coefficient on the interaction term as a percentage-change, DD parameter. This is analogous to the interpretation of a model without a log-scale outcome variable. In the semilogarithmic model, however, the DD parameter we are interested in is a non-linear function of the coefficients from the interaction term and both main effect dummies. The implication of this is that misinterpretation of semilogarithmic DD models can result in answers that are wrong in either direction, and even have the wrong sign. Moreover, there are no simple, common scenarios in which wrong and correct interpretation give similar answers. Thus, the potential problems that can be caused by misinterpretation of DD models can be much more severe than in the case of one dummy. And yet, this problem is prevalent in the profession, even in top journals. For example, I reviewed the papers published in the American Economic Review for the year 2015 and found at least five papers that erroneously interpreted semilogarithmic DD models.
In addition to pointing out this important interpretation issue, this paper also discusses estimation of the correct parameter of interest. I use a simulation study to show that a method suggested by Kennedy (1981) can reduce finite sample bias versus a naïve estimation approach of simply substituting coefficient estimators for parameters in the object of interest. I also introduce a method to produce confidence intervals that can be used to perform inference for both DD and non-DD semilogarithmic models with dummies. For both the estimator and confidence intervals, computation is straight-forward, requiring only regression coefficient estimates, standard errors, and coefficient estimator covariances. These objects are automatically calculated by standard, canned regression commands. Thus, despite the prevalence of this problem in the profession, only minimal effort is required to significantly improve the credibility of empirical studies using semilogarithmic DD models.