TY - JOUR
T1 - Analytical guidelines for co-fractionation mass spectrometry obtained through global profiling of gold standard Saccharomyces cerevisiae protein complexes
AU - Pang, Chi Nam Ignatius
AU - Ballouz, Sara
AU - Weissberger, Daniel
AU - Thibaut, Loïc M.
AU - Hamey, Joshua J.
AU - Gillis, Jesse
AU - Wilkins, Marc R.
AU - Hart-Smith, Gene
PY - 2020/11/1
Y1 - 2020/11/1
N2 - Co-fractionation MS (CF-MS) is a technique with potential to characterize endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study thoroughly evaluates novel and published Saccharomyces cerevisiae CF-MS data sets using very high proteome coverage libraries of yeast gold standard complexes. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending 'Guilt-by-Association' by Degree (EGAD) R package are used for these evaluations, which are verified with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution data set, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies - Spearman and Kendall correlations - and the recently introduced Co-apex metric frequently maximize recall, whereas a popular metric-Euclidean distance-delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying nonmodel organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimize false discovery. These assessments are summarized in a series of universally applicable guidelines for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.
AB - Co-fractionation MS (CF-MS) is a technique with potential to characterize endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study thoroughly evaluates novel and published Saccharomyces cerevisiae CF-MS data sets using very high proteome coverage libraries of yeast gold standard complexes. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending 'Guilt-by-Association' by Degree (EGAD) R package are used for these evaluations, which are verified with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution data set, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies - Spearman and Kendall correlations - and the recently introduced Co-apex metric frequently maximize recall, whereas a popular metric-Euclidean distance-delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying nonmodel organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimize false discovery. These assessments are summarized in a series of universally applicable guidelines for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.
KW - Bioinformatics
KW - Chromatography
KW - co-fractionation mass spectrometry
KW - Protein complex analysis
KW - protein complexes
KW - protein correlation profiling
KW - protein-protein interactions
KW - Saccharomyces cerevisiae
KW - Yeast
UR - http://www.scopus.com/inward/record.url?scp=85094983615&partnerID=8YFLogxK
U2 - 10.1074/mcp.RA120.002154
DO - 10.1074/mcp.RA120.002154
M3 - Article
C2 - 32817346
AN - SCOPUS:85094983615
SN - 1535-9476
VL - 19
SP - 1876
EP - 1895
JO - Molecular & cellular proteomics : MCP
JF - Molecular & cellular proteomics : MCP
IS - 11
ER -