Mis-Reporting and heterogeneous reporting behaviours in models of self reported health

Tuesday, June 24, 2014: 8:50 AM
Waite Phillips 103 (Waite Phillips Hall)

Author(s): Mark Harris

Discussant: John Wildman

There is an exhaustive literature concerned with 'correcting' measures of self-reported health measures due to heterogeneous reporting scales across individuals. That is, faced with standard likert-scale responses, individuals with similar levels of true underlying health, may well tick different boxes on this scale. The bulk of this literature uses responses to hypothetical vignettes questions, which are then used to appropriately scale the individual's responses to the primary health question of interest.

There has also been some recent literature suggesting that typically measures of self-reported health are subject to 'over-inflation'; a certain group of individuals self-select themselves into the middle and/or neighboring categories. An analysis of the raw self-reported health data in most developed countries, would suggest that the bulk of the population is in either 'good' or 'very good' health. However, this is often at odds with other data available, such as obesity and diabetes rates, that might suggest a far less healthy society. Although approaches in this literature will undoubtedly correct for such 'over-inflation' if present in the data, they will similarly be adversely affected by any heterogeneous reporting behavior.

In light of the above arguments, this paper combines these two complementary approaches to handling the nuances of reporting behavior in measures of self-reported health to uncover more accurate and appropriate true underlying measures.

The results suggest that there is indeed both significant reporting heterogeneity as well as over-reporting into these favorable health categories. Thus, in this new approach, we use anchoring vignettes to account for reporting heterogeneity. And we then nest this in a multi-equation structural probabilistic model of true underlying health, where the model splits the population into accurate and inaccurate responders. The former are faced with the full choice set probabilities of 'poor' up to 'excellent'; whereas the latter only face the restricted choice set of 'good' and 'very good'. Ex post we then estimate the relative proportions in each 'class'. The proxies we use to identify the inaccurate reporters are aligned closely to the factors noted above; in addition, we also use some of the vignette responses, as these will be orthogonal to an individual’s true health, but clearly related to their reporting behavior.

Across a wide range of robustness checks, the results suggest that there is indeed both significant reporting heterogeneity as well as over-reporting into these favorable health categories. This generic approach to modeling self-reported health is likely to be widely used, not only in empirical models of health, but also widely across the medical and social sciences: indeed, wherever the researcher has self-reported data to hand.