Census-based rapid and accurate metagenome taxonomic profiling

Amirhossein Shamsaddini, Yang Pan, W. Evan Johnson, Konstantinos Krampis, Mariya Shcheglovitova, Vahan Simonyan, Amy Zanne, Raja Mazumder*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)
30 Downloads (Pure)

Abstract

Background: Understanding the taxonomic composition of a sample, whether from patient, food or environment, is important to several types of studies including pathogen diagnostics, epidemiological studies, biodiversity analysis and food quality regulation. With the decreasing costs of sequencing, metagenomic data is quickly becoming the preferred typed of data for such analysis. Results: Rapidly defining the taxonomic composition (both taxonomic profile and relative frequency) in a metagenomic sequence dataset is challenging because the task of mapping millions of sequence reads from a metagenomic study to a non-redundant nucleotide database such as the NCBI non-redundant nucleotide database (nt) is a computationally intensive task. We have developed a robust subsampling-based algorithm implemented in a tool called CensuScope meant to take a 'sneak peak' into the population distribution and estimate taxonomic composition as if a census was taken of the metagenomic landscape. CensuScope is a rapid and accurate metagenome taxonomic profiling tool that randomly extracts a small number of reads (based on user input) and maps them to NCBI's nt database. This process is repeated multiple times to ascertain the taxonomic composition that is found in majority of the iterations, thereby providing a robust estimate of the population and measures of the accuracy for the results. Conclusion: CensuScope can be run on a laptop or on a high-performance computer. Based on our analysis we are able to provide some recommendations in terms of the number of sequence reads to analyze and the number of iterations to use. For example, to quantify taxonomic groups present in the sample at a level of 1% or higher a subsampling size of 250 random reads with 50 iterations yields a statistical power of >99%. Windows and UNIX versions of CensuScope are available for download at https://hive.biochemistry.gwu.edu/dna.cgi?cmd=censuscope. CensuScope is also available through the High-performance Integrated Virtual Environment (HIVE) and can be used in conjunction with other HIVE analysis and visualization tools.

Original languageEnglish
Article number918
Pages (from-to)1-13
Number of pages13
JournalBMC Genomics
Volume15
DOIs
Publication statusPublished - 21 Oct 2014
Externally publishedYes

Bibliographical note

Copyright the Author(s) 2014. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Census-based
  • Diagnostics
  • Metagenome
  • Next-gen sequence analysis
  • Taxonomic profiling

Fingerprint

Dive into the research topics of 'Census-based rapid and accurate metagenome taxonomic profiling'. Together they form a unique fingerprint.

Cite this