Microorganisms play important roles in nutrient cycling, energy production, and ecosystem health. The collection of microorganisms, also known as the “microbiomes”, are central participants of plant-soil interactions, bioproduct synthesis, environmental sustainability, and human health. The study of microbiomes has long been inhibited by lack of laboratory cultivated microbial isolates. This problem has been alleviated over the past decade through the rapid advancement and adaptation of metagenomics, a sequencing technology that determines the genomic composition of mixed microbial populations encompassing hundreds to thousands of genomes as a whole. The reconstruction of species diversity and function from metagenomic data is computationally highly expensive due to the challenges in assembling the short sequencing reads and the complexity in assigning sequencing data to specific taxonomic lineages, causing significant lags between data generation and data interpretation.
The goal of this project is to facilitate the development of deep learning models to enhance metagenomic data analysis. This effort will lead to an improvement in existing models developed in our prior studies by examining the influences of training data, model parameterization, and model architectures on the speed and accuracy of model development. We will also aim to improve the computational pipeline of model training and testing to improve speed and develop a better control of the data flow.