Abstract
Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.
Original language | English |
---|---|
Title of host publication | VLM4Bio '24 |
Subtitle of host publication | proceedings of the First International Workshop on Vision-Language Models for Biomedical Applications |
Place of Publication | New York |
Publisher | Association for Computing Machinery |
Pages | 13-17 |
Number of pages | 5 |
ISBN (Electronic) | 9798400712074 |
DOIs | |
Publication status | Published - 2024 |
Event | First International Workshop on Vision-Language Models for Biomedical Applications (1st : 2024) - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024 Conference number: 1st |
Conference
Conference | First International Workshop on Vision-Language Models for Biomedical Applications (1st : 2024) |
---|---|
Abbreviated title | VLM4Bio 2024 |
Country/Territory | Australia |
City | Melbourne |
Period | 28/10/24 → 1/11/24 |
Bibliographical note
Copyright the Author(s) 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.Keywords
- Visual Question Answering
- Causal Reasoning
- Bias and Fairness