Content-based medical image retrieval (CBMIR) is an active research area for disease diagnosis and treatment but it can be problematic given the small visual variations between anatomical structures. We propose a retrieval method based on a bag-of-visual-words (BoVW) to identify discriminative characteristics between different medical images with Pruned Dictionary based on Latent Semantic Topic description. We refer to this as the PD-LST retrieval. Our method has two main components. First, we calculate a topic-word significance value for each visual word given a certain latent topic to evaluate how the word is connected to this latent topic. The latent topics are learnt, based on the relationship between the images and words, and are employed to bridge the gap between low-level visual features and high-level semantics. These latent topics describe the images and words semantically and can thus facilitate more meaningful comparisons between the words. Second, we compute an overall-word significance value to evaluate the significance of a visual word within the entire dictionary. We designed an iterative ranking method to measure overall-word significance by considering the relationship between all latent topics and words. The words with higher values are considered meaningful with more significant discriminative power in differentiating medical images. We evaluated our method on two public medical imaging datasets and it showed improved retrieval accuracy and efficiency.
- Dictionary pruning
- Medical image retrieval