TY - JOUR
T1 - Proteogenomic discovery of a small, novel protein in yeast reveals a strategy for the detection of unannotated short open reading frames
AU - Yagoub, Daniel
AU - Tay, Aidan P.
AU - Chen, Zhiliang
AU - Hamey, Joshua J.
AU - Cai, Curtis
AU - Chia, Samantha Z.
AU - Hart-Smith, Gene
AU - Wilkins, Marc R.
PY - 2015/12/4
Y1 - 2015/12/4
N2 - In recent years, proteomic data have contributed to genome annotation efforts, most notably in humans and mice, and spawned a field termed "proteogenomics". Yeast, in contrast with higher eukaryotes, has a small genome, which has lent itself to simpler ORF prediction. Despite this, continual advances in mass spectrometry suggest that proteomics should be able to improve genome annotation even in this well-characterized species. Here we applied a proteogenomics workflow to yeast to identify novel protein-coding genes. Specific databases were generated, from intergenic regions of the genome, which were then queried with MS/MS data. This suggested the existence of several putative novel ORFs of <100 codons, one of which we chose to validate. Synthetic peptides, RNA-Seq analysis, and evidence of evolutionary conservation allowed for the unequivocal definition of a new protein of 78 amino acids encoded on chromosome X, which we dub YJR107C-A. It encodes a new type of domain, which ab initio modeling suggests as predominantly α-helical. We show that this gene is nonessential for growth; however, deletion increases sensitivity to osmotic stress. Finally, from the above discovery process, we discuss a generalizable strategy for the identification of short ORFs and small proteins, many of which are likely to be undiscovered.
AB - In recent years, proteomic data have contributed to genome annotation efforts, most notably in humans and mice, and spawned a field termed "proteogenomics". Yeast, in contrast with higher eukaryotes, has a small genome, which has lent itself to simpler ORF prediction. Despite this, continual advances in mass spectrometry suggest that proteomics should be able to improve genome annotation even in this well-characterized species. Here we applied a proteogenomics workflow to yeast to identify novel protein-coding genes. Specific databases were generated, from intergenic regions of the genome, which were then queried with MS/MS data. This suggested the existence of several putative novel ORFs of <100 codons, one of which we chose to validate. Synthetic peptides, RNA-Seq analysis, and evidence of evolutionary conservation allowed for the unequivocal definition of a new protein of 78 amino acids encoded on chromosome X, which we dub YJR107C-A. It encodes a new type of domain, which ab initio modeling suggests as predominantly α-helical. We show that this gene is nonessential for growth; however, deletion increases sensitivity to osmotic stress. Finally, from the above discovery process, we discuss a generalizable strategy for the identification of short ORFs and small proteins, many of which are likely to be undiscovered.
KW - yeast
KW - proteogenomics
KW - small proteins
KW - small ORFs
UR - http://www.scopus.com/inward/record.url?scp=84949036107&partnerID=8YFLogxK
U2 - 10.1021/acs.jproteome.5b00734
DO - 10.1021/acs.jproteome.5b00734
M3 - Article
C2 - 26554900
AN - SCOPUS:84949036107
VL - 14
SP - 5038
EP - 5047
JO - Journal of Proteome Research
JF - Journal of Proteome Research
SN - 1535-3893
IS - 12
ER -