Semi-supervised learning while controlling the FDR with an application to tandem mass spectrometry analysis

Jack Freestone, Lukas Käll, William Stafford Noble, Uri Keich

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Canonical procedures to control the false discovery rate (FDR) among the list of putative discoveries rely on our ability to compute informative p-values. Competition-based approach offers a fairly novel and increasingly popular alternative when computing such p-values is impractical. The popularity of this approach stems from its wide applicability: instead of computing p-values, which requires knowing the entire null distribution for each null hypothesis, a competition-based approach only requires a single draw from each such null distribution. This drawn example is known as a “decoy” in the mass spectrometry community (which was the first to adopt the competition approach) or as a “knockoff” in the statistics community. The decoy is competed with the original observation so that only the higher scoring of the two is retained. The number of decoy wins is subsequently used to estimate and control the FDR among the target wins.

In this paper we offer a novel method to extend the competition-based approach to control the FDR while taking advantage of side information, i.e., additional features that can help us distinguish between correct and incorrect discoveries. Our motivation comes from the problem of peptide detection in tandem mass spectrometry proteomics data. Specifically, we recently showed that a popular mass spectrometry analysis software tool, Percolator, can apparently fail to control the FDR. We address this problem here by developing a general protocol called “RESET” that can take advantage of the additional features, such as the ones Percolator uses, while still theoretically and empirically controlling the FDR.
Original languageEnglish
Title of host publicationResearch in Computational Molecular Biology
Subtitle of host publication28th Annual International Conference, RECOMB 2024, Cambridge, MA, USA, April 29–May 2, 2024, Proceedings
EditorsJian Ma
Place of PublicationCham
PublisherSpringer, Springer Nature
Pages448-453
Number of pages6
ISBN (Electronic)9781071639894
ISBN (Print)9781071639887
DOIs
Publication statusPublished - 2024
Externally publishedYes
EventInternational Conference on Research in Computational Molecular Biology (28th : 2024) - Cambridge, United States
Duration: 29 Apr 20243 May 2024

Publication series

NameLecture Notes in Computer Science
Volume14758
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Research in Computational Molecular Biology (28th : 2024)
Country/TerritoryUnited States
CityCambridge
Period29/04/243/05/24

Keywords

  • proteomics
  • false discovery rate control
  • tandem mass spectrometry

Cite this