Study uncovers biological drivers of plasma proteins, offering new insights for disease biomarkers and drug discovery
A new study maps the biological influences on thousands of plasma proteins, revealing potential disease biomarkers and drug targets, providing hope for more precise, personalized treatments.
Study: Mapping biological influences on the human plasma proteome beyond the genome. Image Credit: Kateryna Kon / Shutterstock
In a recent study published in the journal Nature Metabolism, researchers leveraged an integrated genomic-deep phenomic approach to map out data-driven biological influences (modifiable and non-modifiable) influencing the levels of 4,775 plasma proteins. The study was carried out on more than 8,000 participants from the Fenland study, with a subset of experiments and analyses (specifically, proteins as biomarkers of disease) conducted on a cohort from the European Prospective Investigation into Cancer (EPIC) Norfolk Study.
Study findings revealed that while the variance in a majority (n = 3,242) of the human plasma proteome is best explained by non-modifiable factors (age, sex, and genetics), a large portion of said proteome can be explained by biologically meaningful associations (n = 1,737). Notably, each protein target was found to be explained by between four and 56 characteristics. Some proteins showed strong associations with specific non-modifiable factors, such as genetic influence, explaining up to 74.27% of their variance, while others, like C-reactive protein, were significantly influenced by modifiable factors like inflammation (up to 68.34%). Those associated with one of a few risk factors comprise ideal candidates for disease screening, while those associated with many present potential biomarkers of holistic health. Additionally, the study’s use of Mendelian randomization revealed several causal relationships between plasma protein levels and diseases, such as the association of reduced kidney function with cardiovascular disease through the COL6A3 protein. Furthermore, almost 600 proteins were identified as drug targets.
Background
‘Proteins’ is an umbrella term for a group of large, complex biomolecules critical to most life functions. They may serve as structural support, biochemical catalysts, hormones, enzymes, building blocks for more complex macromolecules, and even initiators of cellular death. Despite representing the most expansive class of biomolecules for drug discovery, systematic, broad-capture proteomic profiling at a population scale remains limited.
Advances in biomedical engineering have recently enabled the identification and characterization of thousands of blood-borne proteins. Unfortunately, the relative novelty of the field, exacerbated by the low proportion of proteins in the blood (estimated ~10%), has resulted in the origins and purposes of most of the human proteome remaining unknown. Former human plasma proteome investigations have primarily been restricted to a single protein or, at most, a class of similar proteins.
Given the increasing frequency of protein-associated clinical trials (disease screening and drug discovery), a baseline understanding of the modifiable and unmodifiable factors influencing the human proteome and the biological outcomes of these influences is imperative. The current study addresses this gap by systematically integrating genomic data with phenomic data to map the influences on plasma protein levels, providing a comprehensive framework for future research.
About the study
The present study leverages an aptamer-based assay approach to identify and measure human plasma proteins. It subsequently evaluates the relative contributions of modifiable risk factors (dietary, lifestyle), non-modifiable characteristics (age, sex, genetics), and technical factors such as sample handling and measurement procedures on these proteins (expression, post-translational modifications).
Study data was obtained from the long-term Fenland study of more than 12,000 United Kingdom (UK) adults born between 1950 and 1975. Data collection included blood samples (for metabolic assessments), participant-provided information on food habits, general health and lifestyle, objective baseline measurements of clinical well-being (cardiorespiratory fitness, body mass index [BMI], physical activity, and body composition), and anthropometrics. Additionally, fat mass (abdominal visceral, subcutaneous) was estimated using a dual-energy X-ray absorptiometry (DEXA) scan and liver health (hepatic steatosis) via abdominal ultrasounds.
Experimental procedures included genotyping (using the Affymetrix UK Biobank Axiom array), proteomic profiling (using the SomaScan v4 aptamer platform), weighted genetic risk scores (GRS) computation, and Uniform Manifold Approximation and Projection (UMAP) for visualizing any underlying structure in the variation explanations of observed proteome patterns.
Genetic/heritable factors were computed using single nucleotide polymorphism (SNP)-based genetic relationship matrices. To account for the influence of technical factors on plasma protein levels, these were systematically regressed out of the analysis, providing more accurate biological interpretations of the proteome variation. Proteins with drug-discovery potential were annotated using the Human Protein Atlas (HPA) tissue expression dataset. Finally, causal relationships between proteins and their major biological contributor were estimated using Mendelian randomization (MR) analysis, and disease-risk associations using survival analysis.
Study findings
Of the 12,435 adults enrolled in the Fenland study, 8,350 met inclusion criteria (no pregnancy, terminal illness, or physical disability) and were included in the analysis. The study used 4,979 aptamers to identify and measure 4,775 plasma proteins. Notably, each protein could be explained by 4-56 (median 25) characteristics across modifiable, non-modifiable, and technical spheres. Since technical factors are beyond the scope of this study, they were regressed for downstream analysis.
UMAP analysis revealed that non-modifiable factors (n = 3,242 proteins) could explain most biological-mediated proteome variation, while modifiable factors explained 1,737. For instance, genetic factors explained up to 77.3% of variance for certain proteins like neurexin 1. Modifiable factors such as chronic inflammation and smoking were shown to explain variance in specific proteins, although on average they accounted for a smaller proportion of the overall proteome variation (0.10%–0.29%). Sex (0.55% to 60.22%) and genetic factors (3.10% to 74.27%) showed the strongest associations. Notably, some proteins were explained by only one of a few factors, highlighting their importance as biomarkers for disease screening. These corresponded to significant protein-disease associations, including type 2 diabetes (T2D), peripheral arterial disease (PAD), chronic obstructive pulmonary disease (COPD), liver disease, and all-cause mortality.
“In contrast, putative modifiable factors such as chronic low-grade inflammation (CRP explaining up to 68.34% of variation), liver function (alanine transaminase (ALT) explaining up to 56.66% of variation), kidney function (estimated glomerular filtration rate (eGFR) explaining up to 12.79% of variation), and current smoking status (explaining up to 39.98% of variation) explained variation in plasma levels of most proteins but on average explained a relatively small proportion (median variance explained between 0.10% and 0.29%).”
Overall, the ‘modifiable’ proteome was revealed to comprise ~14% of the human plasma proteome. These results suggest that lifestyle choices, such as smoking, diet, and physical activity, can significantly impact plasma protein levels and offer insight into the biological mechanisms that modulate disease risk. Lifestyle choices (e.g., smoking), diets, and health behaviors (e.g., physical activity) were shown to profoundly impact the plasma proteome.
Conclusions
The present study uses a deep proteomics approach to unravel the substantial proteome variation in human plasma and identify their risk associations. The study revealed 4,775 proteins depicting variation due to modifiable (e.g., diet, lifestyle), non-modifiable (e.g., age, sex), and technical (methodology) factors.
Some proteins were identified to have few determinant factors, highlighting their importance as biomarkers of overall health and for specific disease screening. Others were found to have multiple determinants, emphasizing their potential in drug discovery across a range of ailments. Additionally, the causal analysis using Mendelian randomization provided evidence of potential disease-causing pathways, helping to refine the biological interpretation of these proteins and offering opportunities for targeted interventions.
These findings provide unprecedented clarity on the biological drivers underpinning proteome variation and provide clinicians and academics with a framework for future human proteomic investigations. By controlling for technical variation and mapping the multifactorial influences on the proteome, the study lays the groundwork for integrating proteomics into clinical practice for disease screening and drug development.
Two new independent reports show these proteomic assessments can be used to establish causality of disease!https://t.co/gF49kejo2O @NatMetabolism https://t.co/y3E16Yr3ng @NatureCVR https://t.co/2SC1lbGFWf pic.twitter.com/6V6uBLm45P
— Eric Topol (@EricTopol) September 26, 2024