A causal approach to mitigate modality preference bias in medical visual question answering

Shuchang Ye, Usman Naseem, Mingyuan Meng, Dagan Feng, Jinman Kim

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

25 Downloads (Pure)

Abstract

Medical Visual Question Answering (MedVQA) is crucial for enhancing the efficiency of clinical diagnosis by providing accurate and timely responses to clinicians' inquiries regarding medical images. Existing MedVQA models suffered from modality preference bias, where predictions are heavily dominated by one modality while overlooking the other (in MedVQA, usually questions dominate the answer but images are overlooked), thereby failing to learn multimodal knowledge. To overcome the modality preference bias, we proposed a Medical CounterFactual VQA (MedCFVQA) model, which trains with bias and leverages causal graphs to eliminate the modality preference bias during inference. Existing MedVQA datasets exhibit substantial prior dependencies between questions and answers, which results in acceptable performance even if the model significantly suffers from the modality preference bias. To address this issue, we reconstructed new datasets by leveraging existing MedVQA datasets and Changed their P3rior dependencies (CP) between questions and their answers in the training and test set. Extensive experiments demonstrate that MedCFVQA significantly outperforms its non-causal counterpart on both SLAKE, RadVQA and SLAKE-CP, RadVQA-CP datasets.
Original languageEnglish
Title of host publicationVLM4Bio '24
Subtitle of host publicationproceedings of the First International Workshop on Vision-Language Models for Biomedical Applications
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages13-17
Number of pages5
ISBN (Electronic)9798400712074
DOIs
Publication statusPublished - 2024
EventFirst International Workshop on Vision-Language Models for Biomedical Applications (1st : 2024) - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024
Conference number: 1st

Conference

ConferenceFirst International Workshop on Vision-Language Models for Biomedical Applications (1st : 2024)
Abbreviated titleVLM4Bio 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Bibliographical note

Copyright the Author(s) 2024. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Visual Question Answering
  • Causal Reasoning
  • Bias and Fairness

Cite this