Every person has a genome - a specific sequence of genes according to which an individual develops. However, any living organism contains another gene sequence that is called the metagenome.It is the total DNAcontent of the many different microorganisms that inhabit the same environment - bacteria, fungi, and viruses. The metagenome is often indicative of various diseases or predispositions to such diseases. Studying microbiota, i.e. the full range of microorganisms inhabiting different parts of the human body, has thereforea criticalrole in metagenomic research.
The software tool developed by the scientists and called MetaFast is able to conduct a rapid comparative analysis of large amounts of metagenomes.
"In studying the intestinal microflora of patients, we may be able to detect microorganisms associated with a particular disease, such as diabetes, or a predisposition to the disease. This already forms a basis for applying personalized medicine techniques and developing new drugs. Using the results obtained with the software, biologists will be able to draw conclusions on how to further develop their research, because the algorithm enables them to study environments that we currently know nothing about," says Vladimir Ulyantsev, lead developer of the algorithm and researcher at the Computer Technologies Laboratory at ITMO University.
Bioinformatics - MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data
One of the key benefits of the program is that it is able to work successfully with environments in which the genetic contents have not yet been studied. "The newly developed approach allows us to do two things - find all the possible gene sequences, even if they were previously unknown (the program collects them from fragmentsof genomic reads), and at the same time identify metagenomic patterns that distinguish one patient from another, e.g. people with and without a disease," says Dmitry Alexeev, the leader of the project and head of MIPT's Laboratory of Complex Biological Systems.
This means that the program can be used to conduct an untargeted expressanalysis of markers indicating certain diseases. Then, by using targeted methods such as PCR (a technique to make multiple copies of a fragmentof DNA), the results can be verified and adjusted. According to the researchers, the program could greatlyreduce the time neededto develop new drugs.
Microorganisms that do not reproduce in vitro, such as viruses, give very abstract results in tests and it is not possible to collect their DNA. However, the new program is able to detect even these microorganisms. "In the microbiota of the skin alone, 90% of the organisms are unknown," continues Dmitry Alexeev. "Our approach enables us to work with completely unknown material and still obtain results. The program has been tested in a wide variety of environments, including those with a high number of viruses. The program is even able to locate and collect single DNA strands."
MetaFast is not limited to detecting pathogens. For example, the program canalso be used to compare distinct peoples in closed populations with people living in cities to help identify bacterial strains that are extremely useful to humans, but have been lost in the process of urbanization. Antibiotics, preservatives, colorants and supermarket food have pushedmany useful bacteria out of our microflora, but these bacteria could still be present in closed populations, such as American Indians or people in Russian villages.
MetaFast has proven to be highly effective in studying rare and undiscovered metagenomes. As a part of the study, the scientists analysed the metagenome of several of the world's largest lakes. Without any information about the samples of microbiota from the lakes, the program found genetic similarities between samples that were close in terms of their chemical composition.
The researchers also used the new algorithm to study the inhabitants of the New York underground, demonstrating the effectiveness of the algorithm when analysing such complex systems. Most of the DNA collectedusing MetaFast belonged to already known bacteria. This confirms previous theories stating that the subway is safe for humans, and the microbes that live there suppress any flora that could be dangerous to people.
A vast amount of experimental data has already been gathered worldwide on various metagenomes. As the cost of extracting DNA is decreasing and the sensitivity of equipment is increasing, the volume of data is continuing to grow exponentially. Despite this, most of the studies have not been fully completed. The reason lies in the limitations of the current technology. On the one hand, scientists are able to partially collect a metagenome, but piecing together the "puzzle" takes an enormous amount of time. On the other hand, they can compare individual fragmentsof the genome with existing DNA references, but there are very limited numbers of bacteria, and virtually no viruses.
The new algorithm not only combines the advantages of both of these approaches, but also enables data to be processed at high speed. The program saves RAM because it partially collects and partially compares genomes, but does not go into an in-depthcollection analysis.
Motivation: High-throughput metagenomic sequencing has revolutionized our view on the structure and metabolic potential of microbial communities. However, analysis of metagenomic composition is often complicated by the high complexity of the community and the lack of related reference genomic sequences. As a start point for comparative metagenomic analysis, the researchers require efficient means for assessing pairwise similarity of the metagenomes (beta-diversity). A number of approaches is used to address this task, however, most of them have inherent disadvantages that limit their scope of applicability. For instance, the reference-based methods poorly perform on metagenomes from previously unstudied niches, while composition-based methods appear to be too abstract for straightforward interpretation and do not allow to identify the differentially abundant features.
Results: We developed MetaFast, an approach that allows to represent a shotgun metagenome from an arbitrary environment as a modified de Bruijn graph consisting of simplified components. For multiple metagenomes, the resulting representation is used to obtain a pairwise similarity matrix. The dimensional structure of the metagenomic components preserved in our algorithm reflects the inherent subspecies-level diversity of microbiota. The method is computationally efficient and especially promising for an analysis of metagenomes from novel environmental niches.
SOURCES - Eurekalert, Bioinformatics