TY - JOUR
T1 - Multiview multimodal feature fusion for breast cancer classification using deep learning
AU - Hussain, Sadam
AU - Ali, Mansoor
AU - Naseem, Usman
AU - Avalos, Daly Betzabeth Avendano
AU - Cardona-Huerta, Servando
AU - Tamez-Pena, Jose Gerardo
N1 - Copyright the Author(s) 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2025
Y1 - 2025
N2 - The increasing incidence and mortality of breast cancer pose significant global challenges for women. Deep learning (DL) has shown superior diagnostic performance in breast cancer classification compared to human experts. However, most DL methods have relied on unimodal features, which may limit the performance of diagnostic models. Recent studies focus on multimodal data along with multiple views of mammograms, typically two: Cranio-Caudal (CC) and Medio-Lateral-Oblique (MLO). Combining multimodal data has shown improvements in classification effectiveness over single-modal systems. In this study, we compiled a multimodal dataset comprising imaging and textual data (combination of clinical and radiological features). We propose a DL-based multiview multimodal feature fusion (MMFF) strategy for breast cancer classification that utilizes images (four views of mammograms) and tabular data (extracted from radiological reports) from our newly developed in-house dataset. Various augmentation techniques are applied to both imaging and textual data to expand the training dataset size. Imaging features were extracted using a Squeeze-and-Excitation (SE) network-based ResNet50 model, while textual features were extracted using an artificial neural network (ANN). Afterwards, extracted features from both modalities were fused using a late feature fusion strategy. Finally, fused features were fed into the ANN for the final classification of breast cancer. In our study, we compared the performance of our proposed MMFF model with single-modal models (image only) and models built on textual data. The performance was evaluated using accuracy, precision, sensitivity, F1 score and area under the receiver operating characteristic curve (AUC) metrics. Our model MMFF achieved an AUC of 0.965 for benign vs malignant classification as compared to image-only (ResNet50 = 0.545), text-only (ANN = 0.688, SVM = 0.842) and other multimodal approaches (ResNet50+ANN = 0.748, EfficientNetb7+ANN = 0.874).
AB - The increasing incidence and mortality of breast cancer pose significant global challenges for women. Deep learning (DL) has shown superior diagnostic performance in breast cancer classification compared to human experts. However, most DL methods have relied on unimodal features, which may limit the performance of diagnostic models. Recent studies focus on multimodal data along with multiple views of mammograms, typically two: Cranio-Caudal (CC) and Medio-Lateral-Oblique (MLO). Combining multimodal data has shown improvements in classification effectiveness over single-modal systems. In this study, we compiled a multimodal dataset comprising imaging and textual data (combination of clinical and radiological features). We propose a DL-based multiview multimodal feature fusion (MMFF) strategy for breast cancer classification that utilizes images (four views of mammograms) and tabular data (extracted from radiological reports) from our newly developed in-house dataset. Various augmentation techniques are applied to both imaging and textual data to expand the training dataset size. Imaging features were extracted using a Squeeze-and-Excitation (SE) network-based ResNet50 model, while textual features were extracted using an artificial neural network (ANN). Afterwards, extracted features from both modalities were fused using a late feature fusion strategy. Finally, fused features were fed into the ANN for the final classification of breast cancer. In our study, we compared the performance of our proposed MMFF model with single-modal models (image only) and models built on textual data. The performance was evaluated using accuracy, precision, sensitivity, F1 score and area under the receiver operating characteristic curve (AUC) metrics. Our model MMFF achieved an AUC of 0.965 for benign vs malignant classification as compared to image-only (ResNet50 = 0.545), text-only (ANN = 0.688, SVM = 0.842) and other multimodal approaches (ResNet50+ANN = 0.748, EfficientNetb7+ANN = 0.874).
UR - http://www.scopus.com/inward/record.url?scp=85214097938&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3524203
DO - 10.1109/ACCESS.2024.3524203
M3 - Article
AN - SCOPUS:85214097938
SN - 2169-3536
VL - 13
SP - 9265
EP - 9275
JO - IEEE Access
JF - IEEE Access
ER -