Abstract
Background:
Medical image analysis, particularly in the context of Visual Question Answering (VQA) and image captioning, is crucial for accurate diagnosis and educational purposes.
Objective:
Our study introduces BioMedBLIP models, fine-tuned for VQA tasks using specialized medical datasets like ROCO and MIMIC-CXR, and evaluates their performance in comparison to the state-of-the-art (SOTA) Original BLIP model.
Methods:
We present nine versions of BioMedBLIP across three downstream tasks in various datasets. The models are trained on a varying number of epochs. The findings indicate the strong overall performance of our models. We proposed BioMedBLIP for VQA Generation Model, VQA Classification Model, and BioMedBLIP Image Caption Model. We conducted pre-training in BLIP using medical datasets, producing an adapted BLIP model tailored for medical applications.
Results:
In VQA-Generation tasks, BioMedBLIP models outperformed the SOTA on SLAKE, VQA-RAD, and ImageCLEF datasets. In VQA-Classification, our models consistently surpassed the SOTA on SLAKE. Our models also showed competitive performance on VQA-RAD and PathVQA datasets. Similarly, for image captioning tasks, our model beats the SOTA suggesting the importance of pretraining with medical datasets. Overall, in 20 different datasets and task combinations, our BioMedBLIP excels in 15 out of 20 tasks. BioMedBLIP represents a new state-of-the-art in 15 out of 20 tasks (75%) and our responses were rated higher in all 20 tasks (P< 0.005) in comparison to SOTA models.
Conclusions:
Our BioMedBLIP models show promising performance and suggest that incorporating medical knowledge through pretraining with domain-specific medical datasets helps models achieve higher performance. Our models thus demonstrate their potential to advance medical image analysis, impacting diagnosis, medical education, and research. However, data quality, task-specific variability, computational resources, and ethical considerations should be carefully addressed. In conclusion, our models represent a contribution towards the synergy of AI and medicine. We have made BioMedBLIP freely available which will help in further advancing research in multimodal medical tasks.
Medical image analysis, particularly in the context of Visual Question Answering (VQA) and image captioning, is crucial for accurate diagnosis and educational purposes.
Objective:
Our study introduces BioMedBLIP models, fine-tuned for VQA tasks using specialized medical datasets like ROCO and MIMIC-CXR, and evaluates their performance in comparison to the state-of-the-art (SOTA) Original BLIP model.
Methods:
We present nine versions of BioMedBLIP across three downstream tasks in various datasets. The models are trained on a varying number of epochs. The findings indicate the strong overall performance of our models. We proposed BioMedBLIP for VQA Generation Model, VQA Classification Model, and BioMedBLIP Image Caption Model. We conducted pre-training in BLIP using medical datasets, producing an adapted BLIP model tailored for medical applications.
Results:
In VQA-Generation tasks, BioMedBLIP models outperformed the SOTA on SLAKE, VQA-RAD, and ImageCLEF datasets. In VQA-Classification, our models consistently surpassed the SOTA on SLAKE. Our models also showed competitive performance on VQA-RAD and PathVQA datasets. Similarly, for image captioning tasks, our model beats the SOTA suggesting the importance of pretraining with medical datasets. Overall, in 20 different datasets and task combinations, our BioMedBLIP excels in 15 out of 20 tasks. BioMedBLIP represents a new state-of-the-art in 15 out of 20 tasks (75%) and our responses were rated higher in all 20 tasks (P< 0.005) in comparison to SOTA models.
Conclusions:
Our BioMedBLIP models show promising performance and suggest that incorporating medical knowledge through pretraining with domain-specific medical datasets helps models achieve higher performance. Our models thus demonstrate their potential to advance medical image analysis, impacting diagnosis, medical education, and research. However, data quality, task-specific variability, computational resources, and ethical considerations should be carefully addressed. In conclusion, our models represent a contribution towards the synergy of AI and medicine. We have made BioMedBLIP freely available which will help in further advancing research in multimodal medical tasks.
Original language | English |
---|---|
DOIs | |
Publication status | Submitted - 22 Jan 2024 |
Publication series
Name | JMIR Preprints |
---|