TY - JOUR
T1 - A computational learning paradigm to targeted discovery of biocatalysts from metagenomic data
T2 - a case study of lipase identification
AU - Shahraki, Mehdi F.
AU - Atanaki, Fereshteh F.
AU - Ariaeenejad, Shohreh
AU - Ghaffari, Mohammad R.
AU - Norouzi-Beirami, Mohammad H.
AU - Maleki, Morteza
AU - Salekdeh, Ghasem H.
AU - Kavousi, Kaveh
PY - 2022/4
Y1 - 2022/4
N2 - The growing adoption of enzymes as biocatalysts in various industries has accentuated the demand for acquiring access to the great natural diversity and, in the meantime, the advent and advancements of metagenomics and high-throughput sequencing technologies have offered an unprecedented opportunity to explore this extensive resource. Lipases, enzymes responsible for the biological turnover of lipids, are among the most commercialized biocatalysts with numerous applications in different domains and therefore are of high industrial value. The relatively costly and time-consuming wet-lab experimental pipelines commonly used for novel enzyme discovery, highlight the necessity of agile in silico approaches to keep pace with the exponential growth of available sequencing data. In the present study, an in-depth analysis of a tannery wastewater metagenome, including taxonomic and enzymatic profiling, was performed. Using sequence homology-based screening methods and supervised machine learning-based regression models aimed at prediction of lipases' pH and temperature optima, the metagenomic data set was screened for lipolytic enzymes, which led to the isolation of alkaline and highly thermophilic novel lipase. Moreover, MeTarEnz (metagenomic targeted enzyme miner) software was developed and made freely accessible (at https://cbb.ut.ac.ir/MeTarEnz) as a part of this study. MeTarEnz offers several functions to automate the process of targeted enzyme mining from high-throughput sequencing data. This study highlights the competence of computational approaches in exploring vast biodiversity within environmental niches, while providing a set of practical in silico tools as well as a generalized methodology to facilitate the sequence-based mining of biocatalysts.
AB - The growing adoption of enzymes as biocatalysts in various industries has accentuated the demand for acquiring access to the great natural diversity and, in the meantime, the advent and advancements of metagenomics and high-throughput sequencing technologies have offered an unprecedented opportunity to explore this extensive resource. Lipases, enzymes responsible for the biological turnover of lipids, are among the most commercialized biocatalysts with numerous applications in different domains and therefore are of high industrial value. The relatively costly and time-consuming wet-lab experimental pipelines commonly used for novel enzyme discovery, highlight the necessity of agile in silico approaches to keep pace with the exponential growth of available sequencing data. In the present study, an in-depth analysis of a tannery wastewater metagenome, including taxonomic and enzymatic profiling, was performed. Using sequence homology-based screening methods and supervised machine learning-based regression models aimed at prediction of lipases' pH and temperature optima, the metagenomic data set was screened for lipolytic enzymes, which led to the isolation of alkaline and highly thermophilic novel lipase. Moreover, MeTarEnz (metagenomic targeted enzyme miner) software was developed and made freely accessible (at https://cbb.ut.ac.ir/MeTarEnz) as a part of this study. MeTarEnz offers several functions to automate the process of targeted enzyme mining from high-throughput sequencing data. This study highlights the competence of computational approaches in exploring vast biodiversity within environmental niches, while providing a set of practical in silico tools as well as a generalized methodology to facilitate the sequence-based mining of biocatalysts.
KW - lipase
KW - machine learning
KW - metagenomics
KW - sequence-based
KW - targeted biocatalyst discovery
UR - http://www.scopus.com/inward/record.url?scp=85124144582&partnerID=8YFLogxK
U2 - 10.1002/bit.28037
DO - 10.1002/bit.28037
M3 - Article
C2 - 35067915
AN - SCOPUS:85124144582
SN - 0006-3592
VL - 119
SP - 1115
EP - 1128
JO - Biotechnology and Bioengineering
JF - Biotechnology and Bioengineering
IS - 4
ER -