Protannotator: A semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome

Mohammad T. Islam, Gagan Garg, William S. Hancock, Brian A. Risk, Mark S. Baker, Shoba Ranganathan

Research output: Contribution to journalArticleResearchpeer-review

Abstract

The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20 128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

LanguageEnglish
Pages76-83
Number of pages8
JournalJournal of Proteome Research
Volume13
Issue number1
DOIs
Publication statusPublished - 3 Jan 2014

Fingerprint

Proteome
Chromosomes
Pipelines
Proteins
Human Chromosomes
Proteomics
Peptides
Gene Ontology
Chromosomes, Human, Pair 7
Bioinformatics
Computational Biology
Computer Simulation
Ontology
Servers
Software
Genes

Cite this

@article{5fa0915eb973460b9678babdfe9ff558,
title = "Protannotator: A semiautomated pipeline for chromosome-wise functional annotation of the {"}missing{"} human proteome",
abstract = "The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20 128 proteins for the human proteome, of which 3831 human proteins (∼19{\%}) are considered {"}missing{"} according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 {"}missing{"} proteins into a semiautomated pipeline to functionally annotate the {"}missing{"} human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2{\%}) {"}missing{"} proteins, followed by 703 (18.4{\%}) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7{\%}) homologues from reviewed human proteins. Functional annotations for 1945 (50.8{\%}) {"}missing{"} proteins were also determined. To accelerate the identification of {"}missing{"} proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8{\%}) of the 3831 {"}missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 {"}missing{"} proteins. The chromosome-wise functional annotation of all {"}missing{"} proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).",
author = "Islam, {Mohammad T.} and Gagan Garg and Hancock, {William S.} and Risk, {Brian A.} and Baker, {Mark S.} and Shoba Ranganathan",
year = "2014",
month = "1",
day = "3",
doi = "10.1021/pr400794x",
language = "English",
volume = "13",
pages = "76--83",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "AMER CHEMICAL SOC",
number = "1",

}

Protannotator : A semiautomated pipeline for chromosome-wise functional annotation of the "missing" human proteome. / Islam, Mohammad T.; Garg, Gagan; Hancock, William S.; Risk, Brian A.; Baker, Mark S.; Ranganathan, Shoba.

In: Journal of Proteome Research, Vol. 13, No. 1, 03.01.2014, p. 76-83.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - Protannotator

T2 - Journal of Proteome Research

AU - Islam, Mohammad T.

AU - Garg, Gagan

AU - Hancock, William S.

AU - Risk, Brian A.

AU - Baker, Mark S.

AU - Ranganathan, Shoba

PY - 2014/1/3

Y1 - 2014/1/3

N2 - The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20 128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

AB - The chromosome-centric human proteome project (C-HPP) aims to define the complete set of proteins encoded in each human chromosome. The neXtProt database (September 2013) lists 20 128 proteins for the human proteome, of which 3831 human proteins (∼19%) are considered "missing" according to the standard metrics table (released September 27, 2013). In support of the C-HPP initiative, we have extended the annotation strategy developed for human chromosome 7 "missing" proteins into a semiautomated pipeline to functionally annotate the "missing" human proteome. This pipeline integrates a suite of bioinformatics analysis and annotation software tools to identify homologues and map putative functional signatures, gene ontology, and biochemical pathways. From sequential BLAST searches, we have primarily identified homologues from reviewed nonhuman mammalian proteins with protein evidence for 1271 (33.2%) "missing" proteins, followed by 703 (18.4%) homologues from reviewed nonhuman mammalian proteins and subsequently 564 (14.7%) homologues from reviewed human proteins. Functional annotations for 1945 (50.8%) "missing" proteins were also determined. To accelerate the identification of "missing" proteins from proteomics studies, we generated proteotypic peptides in silico. Matching these proteotypic peptides to ENCODE proteogenomic data resulted in proteomic evidence for 107 (2.8%) of the 3831 "missing proteins, while evidence from a recent membrane proteomic study supported the existence for another 15 "missing" proteins. The chromosome-wise functional annotation of all "missing" proteins is freely available to the scientific community through our web server (http://biolinfo.org/protannotator).

UR - http://www.scopus.com/inward/record.url?scp=84891782987&partnerID=8YFLogxK

U2 - 10.1021/pr400794x

DO - 10.1021/pr400794x

M3 - Article

VL - 13

SP - 76

EP - 83

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 1

ER -