Machine learning reveals key markers for healthy aging, separate from chronic disease risks
In a study published in the journal Nature Aging, researchers applied machine learning to analyze the health trajectories of healthy individuals over time and distinguish inherent aging factors from chronic disease risks. They found that the model could consistently identify early indicators of healthy aging, such as neutrophil counts and alkaline phosphatase levels across individuals from Israel, the United Kingdom (UK), and the United States of America (USA).
Study: Longitudinal machine learning uncouples healthy aging factors from chronic disease risks
Background
The “geroscience hypothesis” suggests that targeting universal aging processes may promote healthy aging, improve lifespan, and reduce the prevalence of age-related diseases, including type 2 diabetes mellitus (T2D), cardiovascular disease (CVD), chronic kidney disease (CKD), liver disease (LD), and chronic obstructive pulmonary disease (COPD). The co-occurrence and correlation of age-related diseases with aging pose challenges in modeling causality. This calls for unbiased approaches to investigate the interplay between healthy aging and age-related diseases.
Although electronic health records (EHRs) offer significant potential in capturing the health trajectories of patients, the existing data is limited (up to 20 years), hindering our understanding of the relationship between aging, disease, and disease risk. Additionally, previous studies conducted to model mortality and age using clinical markers lack the use of a longitudinal model. To address this gap, researchers in the present study developed a machine learning-based model to identify predictive clinical markers for disease-free healthy aging. They revisited the heritability and genetic associations of phenotypes linked to longevity.
About the study
Medical history data of 4.57 million individuals aged 30 to 85 years was obtained from the Clalit Healthcare Services database, tracking them for a median of 16.6 years. First, a machine learning model was developed using the three-year history of patients aged above 80 years. Laboratory tests correlating with longevity were analyzed. Next, longevity potential was assessed across ages by implementing a machine-learning model that could infer longitudinal trajectories using partial patient histories. A longevity potential score was determined for each age, predicting five-year mortality or a change in longevity potential.
Further, to understand how lifelong disease predisposition potentially affected the longevity score, the researchers implemented an extended disease risk Markov model using disease-onset data for T2D, CVD, LD, CKD, and COPD. The physiological processes underlying longevity potential were investigated in very healthy individuals using clinical markers over a >10-year follow-up.
The model was then tested on the UKBB (short for UK Biobank) and NHANES (short for National Health and Nutrition Examination Survey) population databases. Patients aged 50 were classified into 15 groups, and their disease predisposition, allele frequencies, and parental mortality were analyzed.
Results and discussion
The three-year history model could discern a detailed spectrum of risk levels, highlighting significant prognostic differences even within the top 4% of healthy patients. Laboratory tests could identify red blood cell distribution width (RDW), C-reactive protein, and albumin as markers continually associated with prognosis. The model provided a generalizable metric for health that could classify patients as healthy and unhealthy, encouraging the use of models that quantitatively track the changes in health potential. The model accurately distinguished individuals’ survival probabilities beyond 85 years, even at age 30.
Clinical markers contributing to the longevity score were found to vary across ages. While alkaline phosphatase was found to impact younger adults, glucose and cholesterol seemed to affect mid-adulthood and albumin and RDW were found to impact older ages. Key features like overweight, blood sugar, and cholesterol were observed to play a significant role in predicting lifelong disease risk. Markers of chronic disease risk were found to be consistently low in very healthy individuals. A high longevity score was indicated by low levels of neutrophils, alkaline phosphatase, and the ratio of microcytic and hypochromatic red blood cells, as well as medium levels of body mass index, creatinine, and liver enzymes.
The models’ predictive power was shown to increase with age, particularly in identifying high-risk individuals for diseases like T2D at ages 50–60 due to improved sensitivity from routine tracking. The estimated lifelong disease predispositions were found to be strongly associated with each other and correlated with the longevity score. However, a subset of individuals exhibited variation in longevity potential despite low disease risk.
The longevity scores were found to be robust across Israeli, US, and UK populations, demonstrating significant predictive power for longevity in individuals without known predisposition to diseases. Furthermore, the degree of disease predisposition was found to vary between populations at age 50. Parents of highest longevity scoring-individuals were found to have a one-year increase in lifespan. As per the study, genetic variation may also contribute to longevity. The researchers recommend using a multivariate disease risk model to interpret genome-wide association studies.
Conclusion
In conclusion, the present study improves our understanding of the interplay between aging and major chronic diseases, paving the way for comprehensive, longitudinal models to replace static representations of healthy aging and common diseases. Further research is required to quantify a “healthy state” and investigate the physiological processes underlying the disease-related findings highlighted in the study.