TY - JOUR
T1 - Statistics in proteomics
T2 - a meta-analysis of 100 proteomics papers published in 2019
AU - Handler, David C. L.
AU - Haynes, Paul A.
N1 - Correction to article: David C. L. Handler and Paul A. Haynes, (2019) “Statistics in Proteomics: A Meta-analysis of 100 Proteomics Papers Published in 2019” Journal of the American Society for Mass Spectrometry 2021 32 (7), 1846-1846. DOI: 10.1021/jasms.0c00253
PY - 2020/7/1
Y1 - 2020/7/1
N2 - We randomly selected 100 journal articles published in five proteomics journals in 2019 and manually examined each of them against a set of 13 criteria concerning the statistical analyses used, all of which were based on items mentioned in the journals' instructions to authors. This included questions such as whether a pilot study was conducted and whether false discovery rate calculation was employed at either the quantitation or identification stage. These data were then transformed to binary inputs, analyzed via machine learning algorithms, and classified accordingly, with the aim of determining if clusters of data existed for specific journals or if certain statistical measures correlated with each other. We applied a variety of classification methods including principal component analysis decomposition, agglomerative clustering, and multinomial and Bernoulli naïve Bayes classification and found that none of these could readily determine journal identity given extracted statistical features. Logistic regression was useful in determining high correlative potential between statistical features such as false discovery rate criteria and multiple testing corrections methods, but was similarly ineffective at determining correlations between statistical features and specific journals. This meta-analysis highlights that there is a very wide variety of approaches being used in statistical analysis of proteomics data, many of which do not conform to published journal guidelines, and that contrary to implicit assumptions in the field there are no clear correlations between statistical methods and specific journals.
AB - We randomly selected 100 journal articles published in five proteomics journals in 2019 and manually examined each of them against a set of 13 criteria concerning the statistical analyses used, all of which were based on items mentioned in the journals' instructions to authors. This included questions such as whether a pilot study was conducted and whether false discovery rate calculation was employed at either the quantitation or identification stage. These data were then transformed to binary inputs, analyzed via machine learning algorithms, and classified accordingly, with the aim of determining if clusters of data existed for specific journals or if certain statistical measures correlated with each other. We applied a variety of classification methods including principal component analysis decomposition, agglomerative clustering, and multinomial and Bernoulli naïve Bayes classification and found that none of these could readily determine journal identity given extracted statistical features. Logistic regression was useful in determining high correlative potential between statistical features such as false discovery rate criteria and multiple testing corrections methods, but was similarly ineffective at determining correlations between statistical features and specific journals. This meta-analysis highlights that there is a very wide variety of approaches being used in statistical analysis of proteomics data, many of which do not conform to published journal guidelines, and that contrary to implicit assumptions in the field there are no clear correlations between statistical methods and specific journals.
UR - http://www.scopus.com/inward/record.url?scp=85087466144&partnerID=8YFLogxK
UR - https://doi.org/10.1021/jasms.0c00253
U2 - 10.1021/jasms.9b00142
DO - 10.1021/jasms.9b00142
M3 - Article
C2 - 32324388
AN - SCOPUS:85087466144
SN - 1044-0305
VL - 31
SP - 1337
EP - 1343
JO - Journal of the American Society for Mass Spectrometry
JF - Journal of the American Society for Mass Spectrometry
IS - 7
ER -