Abstract
Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a federated deep learning approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n = 1,260) and 29 cohorts held behind private firewalls (n = 6,265), representing 19,930 replicate data-independent acquisition mass spectrometry runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n = 625) in 14 cancer subtyping tasks compared with local models and matching centralized model performance. The approach’s generalizability was demonstrated by retraining the global model with data from two external, data-independent acquisition mass spectrometry cohorts (n = 55) and eight acquired by tandem mass tag proteomics (n = 832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, for example, for discovering predictive biomarkers or treatment targets while maintaining data privacy. Significance: A federated deep learning approach applied to human proteomic data, acquired using two distinct proteomic technologies from 40 tumor cohorts across eight countries, enabled accurate cancer histopathologic subtyping while preserving data privacy. This approach will enable the privacy-compliant development of large-scale proteomic artificial intelligence models, including foundation models, across institutions globally.
| Original language | English |
|---|---|
| Pages (from-to) | 1803-1818 |
| Number of pages | 16 |
| Journal | Cancer Discovery |
| Volume | 15 |
| Issue number | 9 |
| DOIs | |
| Publication status | Published - 4 Sept 2025 |
Bibliographical note
Copyright the Author(s) 2025. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Fingerprint
Dive into the research topics of 'Federated deep learning enables cancer subtyping by proteomics'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver