Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

Guy Tsafnat*, Enrico Coiera, Sally R. Partridge, Jaron Schaeffer, Jon R. Iredell

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

16 Citations (Scopus)


Background: Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies. Results: We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved κ = 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (α ≥ 95%, E ≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives. Conclusion: Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96). Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.

Original languageEnglish
Article number1471
Pages (from-to)281
Number of pages1
JournalBMC Bioinformatics
Publication statusPublished - 8 Sept 2009
Externally publishedYes


Dive into the research topics of 'Context-driven discovery of gene cassettes in mobile integrons using a computational grammar'. Together they form a unique fingerprint.

Cite this