TY - JOUR
T1 - Proteomic validation of transcript isoforms, including those assembled from RNA-Seq data
AU - Tay, Aidan P.
AU - Pang, Chi Nam Ignatius
AU - Twine, Natalie A.
AU - Hart-Smith, Gene
AU - Harkness, Linda
AU - Kassem, Moustapha
AU - Wilkins, Marc R.
PY - 2015/9/4
Y1 - 2015/9/4
N2 - Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).
AB - Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same sample, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC samples were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).
KW - alternative splicing
KW - isoform-specific peptides
KW - mesenchymal stem cell
KW - proteotypic peptides
KW - RNA-seq
KW - splice-junction peptides
UR - http://www.scopus.com/inward/record.url?scp=84941105408&partnerID=8YFLogxK
U2 - 10.1021/pr5011394
DO - 10.1021/pr5011394
M3 - Article
C2 - 25961807
AN - SCOPUS:84941105408
VL - 14
SP - 3541
EP - 3554
JO - Journal of Proteome Research
JF - Journal of Proteome Research
SN - 1535-3893
IS - 9
ER -