Microscopic organisms residing in our bodies, soil, oceans, and atmosphere are pivotal to human health and global ecosystems. Despite advancements in DNA sequencing, identifying these microbes and understanding their interrelations remains a formidable challenge. Recent studies from Arizona State University have introduced powerful tools that significantly enhance this process, offering increased accuracy and scalability.
Researchers at ASU have developed two groundbreaking tools: one to refine microbial family tree construction and another to serve as a foundational software for biological data analysis. These innovations are set to bolster microbiome research, disease surveillance, environmental assessments, and emerging fields like precision medicine. “Our team builds open-source software tools because we believe that when everyone can access and extend scientific tools, the entire community benefits and discovery accelerates,” stated Qiyun Zhu of ASU.
Zhu, an assistant professor at ASU’s School of Life Sciences and researcher at the Biodesign Center for Fundamental and Applied Microbiomics, collaborated with both ASU colleagues and international partners. The first study, focusing on marker gene improvement, was published in Nature Communications, while the second study, detailing the open-source software library scikit-bio, appeared in Nature Methods.
Constructing precise evolutionary trees is crucial for tracking microbial evolution and influence. Enhanced evolutionary trees aid in disease monitoring and help researchers track changes in harmful microbes over time. They also refine environmental research, illustrating microbial responses to pollution and climate changes. Improved microbial identification strengthens gut microbiome studies and their health implications.
The process of uncovering microbial relationships begins with selecting appropriate marker genes, which trace evolutionary history. Traditionally, scientists relied on a limited set of marker genes. However, with the rise of metagenomics, researchers now work with millions of genomes, often sourced directly from environmental samples. Metagenomics facilitates the simultaneous sequencing of all DNA in an environment, unveiling hidden microbial communities.
These genomes, though valuable, often lack completeness or consistent quality, complicating the use of fixed marker genes for accurate evolutionary results. To address this, Zhu and colleagues developed TMarSel (Tree-based Marker Selection). This tool automates the search through thousands of potential gene families, selecting combinations that construct the most reliable evolutionary trees. TMarSel evaluates each gene’s prevalence, informativeness, and contribution to a stable depiction of microbial relationships, offering a flexible, data-driven approach even for large and diverse organism groups.
Zhu also leads the development of scikit-bio, an extensive open-source software library that equips scientists with tools for analyzing vast biological datasets. It is particularly beneficial for microbiome studies—examining microbial communities within specific environments like the human gut. Biological datasets are uniquely complex, characterized by immense size, sparseness, and numerous interconnected features. Standard data-analysis software is ill-equipped to handle such complexity. Scikit-bio bridges this gap, providing over 500 functions for tasks such as comparing microbial communities, calculating diversity, transforming compositional data, analyzing genetic sequences, building phylogenetic trees, and preparing data for machine learning.
Supported by more than 80 contributors, scikit-bio is rigorously tested and documented, having been cited in tens of thousands of scientific papers across disciplines such as medicine, ecology, climate science, and cancer biology. It has become an indispensable tool for researchers exploring the microbiome and other data-rich areas of modern biology.
As biological datasets continue to expand, tools like scikit-bio and TMarSel enhance the reliability and reproducibility of large-scale research. These studies underscore ASU’s growing influence at the intersection of biology and computation. Zhu’s work exemplifies how integrating evolutionary insight with advanced software engineering can produce tools utilized globally by scientists.
With DNA sequencing becoming faster and more affordable, researchers are poised to uncover even more of the microbial universe. Tools like TMarSel and scikit-bio ensure that this influx of data can be converted into meaningful scientific insight.




