Menu

Using Health Histories to Improve Prediction of Mortality and Morbidity

Wednesday, June 26, 2019: 12:00 PM
Taft - Mezzanine Level (Marriott Wardman Park Hotel)

Presenter: Dongyue Ying

Co-Author: Dean Lillard

Discussant: Alice Zulkarnain


We explore how to use available data to better predict future mortality and morbidity. Most studies use self-reported health (SRH) to predict. However, several longitudinal and cross-sectional surveys not only ask respondents to report SRH but also to retrospectively whether and when they experienced diseases and health conditions/events. For example, the Panel Study of Income Dynamics (PSID), Health and Retirement Study (HRS), and National Health Interview Surveys (NHIS) collect both types of data. Current literature does not fully exploit these data, in part because researchers have not established whether they reliably measure past health.

We use PSID data to generate health histories for a set of diseases, conditions, and health events respondents said they have, had, or experienced. Respondents report whether a doctor ever diagnosed them with events (e.g. stroke), diseases (e.g. diabetes) or conditions (e.g. hypertension) and their age at diagnosis. We create variables that indicate, for each year of a respondent’s life, whether a person has the condition/disease or experienced the event in that year. We use data on health in childhood to create an index of early life health.

To use these data, they must be valid and reliable. We gauge validity with NHIS data. For each of five birth cohorts, we compare the prevalence of each condition/disease in every calendar year to prevalence rates (for the same cohort) measured contemporaneously in that year’s NHIS. We explore the discrepancies between the retrospective and contemporaneous rates to investigate whether and how much recall bias, selective mortality, and coding algorithm explains differences. To check our coding algorithm “out-of-sample” we will apply the algorithm to comparable data from HRS and then gauge the constructed prevalence rates against the contemporaneously measured NHIS prevalence rates for the same birth cohorts.

To investigate whether the health histories improve prediction, we select the retrospective histories that we are able to validate. We use them with the PSID data to explore the association between the health histories, SRH, morbidity and mortality. We exploit the longitudinal structure of the PSID to predict both death and the onset of conditions/diseases/events we know respondents will experience. For the mortality analysis we will use data from the PSID “Death” File – that identifies the cause and date of death of all deceased PSID respondents. We control for demographic, socio-economic factors and our index of each respondent’s health during childhood.

Preliminary results based on unvalidated data show that short and long-run health histories explain cross-sectional variation in SRH. The health histories and SRH independently predict the onset of cancer. These (admittedly) preliminary analyses suggest that including data on validated health histories in models of future health will improve prediction.


Full Papers: