Multiview multimodal feature fusion for breast cancer classification using deep learning

Sadam Hussain*, Mansoor Ali, Usman Naseem, Daly Betzabeth Avendano Avalos, Servando Cardona-Huerta, Jose Gerardo Tamez-Pena

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Downloads (Pure)

Abstract

The increasing incidence and mortality of breast cancer pose significant global challenges for women. Deep learning (DL) has shown superior diagnostic performance in breast cancer classification compared to human experts. However, most DL methods have relied on unimodal features, which may limit the performance of diagnostic models. Recent studies focus on multimodal data along with multiple views of mammograms, typically two: Cranio-Caudal (CC) and Medio-Lateral-Oblique (MLO). Combining multimodal data has shown improvements in classification effectiveness over single-modal systems. In this study, we compiled a multimodal dataset comprising imaging and textual data (combination of clinical and radiological features). We propose a DL-based multiview multimodal feature fusion (MMFF) strategy for breast cancer classification that utilizes images (four views of mammograms) and tabular data (extracted from radiological reports) from our newly developed in-house dataset. Various augmentation techniques are applied to both imaging and textual data to expand the training dataset size. Imaging features were extracted using a Squeeze-and-Excitation (SE) network-based ResNet50 model, while textual features were extracted using an artificial neural network (ANN). Afterwards, extracted features from both modalities were fused using a late feature fusion strategy. Finally, fused features were fed into the ANN for the final classification of breast cancer. In our study, we compared the performance of our proposed MMFF model with single-modal models (image only) and models built on textual data. The performance was evaluated using accuracy, precision, sensitivity, F1 score and area under the receiver operating characteristic curve (AUC) metrics. Our model MMFF achieved an AUC of 0.965 for benign vs malignant classification as compared to image-only (ResNet50 = 0.545), text-only (ANN = 0.688, SVM = 0.842) and other multimodal approaches (ResNet50+ANN = 0.748, EfficientNetb7+ANN = 0.874).

Original languageEnglish
Pages (from-to)9265-9275
Number of pages11
JournalIEEE Access
Volume13
DOIs
Publication statusPublished - 2025

Bibliographical note

Copyright the Author(s) 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Cite this