A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification

Mehdi F. Shahraki, Fereshteh F. Atanaki, Shohreh Ariaeenejad, Mohammad R. Ghaffari, Mohammad H. Norouzi-Beirami, Morteza Maleki, Ghasem H. Salekdeh*, Kaveh Kavousi*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    13 Citations (Scopus)


    The growing adoption of enzymes as biocatalysts in various industries has accentuated the demand for acquiring access to the great natural diversity and, in the meantime, the advent and advancements of metagenomics and high-throughput sequencing technologies have offered an unprecedented opportunity to explore this extensive resource. Lipases, enzymes responsible for the biological turnover of lipids, are among the most commercialized biocatalysts with numerous applications in different domains and therefore are of high industrial value. The relatively costly and time-consuming wet-lab experimental pipelines commonly used for novel enzyme discovery, highlight the necessity of agile in silico approaches to keep pace with the exponential growth of available sequencing data. In the present study, an in-depth analysis of a tannery wastewater metagenome, including taxonomic and enzymatic profiling, was performed. Using sequence homology-based screening methods and supervised machine learning-based regression models aimed at prediction of lipases' pH and temperature optima, the metagenomic data set was screened for lipolytic enzymes, which led to the isolation of alkaline and highly thermophilic novel lipase. Moreover, MeTarEnz (metagenomic targeted enzyme miner) software was developed and made freely accessible (at https://cbb.ut.ac.ir/MeTarEnz) as a part of this study. MeTarEnz offers several functions to automate the process of targeted enzyme mining from high-throughput sequencing data. This study highlights the competence of computational approaches in exploring vast biodiversity within environmental niches, while providing a set of practical in silico tools as well as a generalized methodology to facilitate the sequence-based mining of biocatalysts.

    Original languageEnglish
    Pages (from-to)1115-1128
    Number of pages14
    JournalBiotechnology and Bioengineering
    Issue number4
    Early online date3 Feb 2022
    Publication statusPublished - Apr 2022


    • lipase
    • machine learning
    • metagenomics
    • sequence-based
    • targeted biocatalyst discovery


    Dive into the research topics of 'A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data: a case study of lipase identification'. Together they form a unique fingerprint.

    Cite this