The random nature of genome architecture: Predicting open reading frame distributions

Michael W. McCoy*, Andrew P. Allen, James F. Gillooly

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    7 Citations (Scopus)
    10 Downloads (Pure)


    Background: A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes. Methodology/Principal Findings: By fitting mixture models to data from whole genome sequences we show that the sizefrequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60-80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining "non-random" ORFs are well-fitted by lognormal or gamma distributions, and similar to the size distributions of annotated proteins. Conclusions/Significance: Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes.

    Original languageEnglish
    Article numbere6456
    Pages (from-to)1-8
    Number of pages8
    JournalPLoS ONE
    Issue number7
    Publication statusPublished - 30 Jul 2009

    Bibliographical note

    Copyright the Author(s) 2009. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.


    Dive into the research topics of 'The random nature of genome architecture: Predicting open reading frame distributions'. Together they form a unique fingerprint.

    Cite this