Weakly supervised deep learning image analysis can differentiate melanoma from naevi on haematoxylin and eosin-stained histopathology slides

Nigel G. Maher, Homay Danaei Mehr, Cong Cong, Nurudeen A. Adegoke, Ismael A. Vergara, Sidong Liu, Richard A. Scolyer

Research output: Contribution to journalArticlepeer-review

Abstract

Background: The broad histomorphological spectrum of melanocytic pathologies requires large data sets to develop accurate and generalisable deep learning (DL)-based diagnostic pathology classifiers. Weakly supervised DL promotes utilisation of larger training data sets compared to fully supervised (patch annotation) approaches.

Objectives: To evaluate weakly supervised DL image classifiers for discriminating melanomas from naevi on haematoxylin and eosin (H&E)-stained pathology slides.

Methods: A representative H&E slide for 260 naevi and 260 melanomas from mucocutaneous sites at one tertiary institution was digitized. Clinicopathological features were recorded for each case including thickness and histological subtype. Whole-slide or whole-tissue section labels were applied. The ground truth was established by consensus diagnosis from two pathologists. Multiple-instance learning models, Trans-MIL, CLAM and DTFD-MIL were evaluated at 10×, 20× and 40× magnifications using stratified fivefold Monte Carlo cross-validation, with 80/10/10 splits for training/validation/test groups, to predict melanoma from naevus. Heatmaps were generated to understand model performance.

Results: Naevi cases were younger (median age: 51 years; melanoma median age: 71.5 years), with more balanced sex distribution (males: 48.8%, melanoma male subgroup: 64.2%). The most frequent histological subtypes of naevi and melanomas were dysplastic compound (n = 99, 38.1%) and superficial spreading (n = 124, 47.7%), respectively. Average AUC (±1 SD) for Trans-MIL, CLAM and DTFD-MIL across test groups were 0.9952 ± 0.006, 0.9925 ± 0.0052 and 0.9708 ± 0.0328, at 20× magnification, respectively. Performance of the models varied according to the magnification used. Heatmaps from the two best performing models, Trans-MIL and CLAM, generally indicated attention on appropriate tissue regions for interpretation.

Conclusions: Weakly supervised DL on pathological slides of common mucocutaneous melanocytic tumours provides highly accurate diagnostic value for discrimination of melanomas and naevi. External validation and further assessment on less frequently occurring histologic subtypes and borderline cases using this method is required.

Original languageEnglish
Number of pages9
JournalJournal of the European Academy of Dermatology and Venereology
Early online date31 Aug 2024
DOIs
Publication statusE-pub ahead of print - 31 Aug 2024

Cite this