Skip to main navigation Skip to search Skip to main content

Federated deep learning enables cancer subtyping by proteomics

Zhaoxiang Cai, Emma L. Boys, Zainab Noor, Adel T. Aref, Dylan Xavier, Natasha Lucas, Steven G. Williams, Jennifer M. S. Koh, Rebecca C. Poulos, Yangxiu Wu, Michael Dausmann, Karen L. Mackenzie, Adriana Aguilar-Mahecha, Carolina Armengol, Maria M. Barranco, Mark Basik, Elise D. Bowman, Roderick Clifton-Bligh, Elizabeth A. Connolly, Wendy A. CooperBhavik Dalal, Anna Defazio, Martin Filipits, Peter J. Flynn, J. Dinny Graham, Jacob George, Anthony J. Gill, Michael Gnant, Rosemary Habib, Curtis C. Harris, Kate Harvey, Lisa G. Horvath, Christopher Jackson, Maija R. J. Kohonen-Corish, Elgene Lim, Jia (Jenny) Liu, Georgina V. Long, Reginald V. Lord, Graham J. Mann, Geoffrey W. McCaughan, Lucy Morgan, Leigh Murphy, Sumanth Nagabushan, Adnan Nagrial, Jordi Navinés, Benedict J. Panizza, Jaswinder S. Samra, Richard A. Scolyer, John Souglakos, Alexander Swarbrick, David Thomas, Rosemary L. Balleine, Peter G. Hains*, Phillip J. Robinson*, Qing Zhong*, Roger R. Reddel*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Artificial intelligence applications in biomedicine face major challenges from data privacy requirements. To address this issue for clinically annotated tissue proteomic data, we developed a federated deep learning approach (ProCanFDL), training local models on simulated sites containing data from a pan-cancer cohort (n = 1,260) and 29 cohorts held behind private firewalls (n = 6,265), representing 19,930 replicate data-independent acquisition mass spectrometry runs. Local parameter updates were aggregated to build the global model, achieving a 43% performance gain on the hold-out test set (n = 625) in 14 cancer subtyping tasks compared with local models and matching centralized model performance. The approach’s generalizability was demonstrated by retraining the global model with data from two external, data-independent acquisition mass spectrometry cohorts (n = 55) and eight acquired by tandem mass tag proteomics (n = 832). ProCanFDL presents a solution for internationally collaborative machine learning initiatives using proteomic data, for example, for discovering predictive biomarkers or treatment targets while maintaining data privacy. Significance: A federated deep learning approach applied to human proteomic data, acquired using two distinct proteomic technologies from 40 tumor cohorts across eight countries, enabled accurate cancer histopathologic subtyping while preserving data privacy. This approach will enable the privacy-compliant development of large-scale proteomic artificial intelligence models, including foundation models, across institutions globally.

Original languageEnglish
Pages (from-to)1803-1818
Number of pages16
JournalCancer Discovery
Volume15
Issue number9
DOIs
Publication statusPublished - 4 Sept 2025

Bibliographical note

Copyright the Author(s) 2025. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Fingerprint

Dive into the research topics of 'Federated deep learning enables cancer subtyping by proteomics'. Together they form a unique fingerprint.

Cite this