A novel multiple kernel fuzzy topic modeling technique for biomedical data

Junaid Rashid, Jungeun Kim*, Amir Hussain, Usman Naseem, Sapna Juneja

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)
1 Downloads (Pure)

Abstract

Background: Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. 

Methods: In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. 

Results: Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. 

Conclusion: We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.

Original languageEnglish
Article number275
Pages (from-to)1-19
Number of pages19
JournalBMC Bioinformatics
Volume23
DOIs
Publication statusPublished - 2022
Externally publishedYes

Bibliographical note

Copyright the Author(s) 2022. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.

Keywords

  • Topic modeling
  • Medical data
  • Multiple kernel fuzzy topic modeling
  • MKFTM
  • Classification
  • Clustering

Fingerprint

Dive into the research topics of 'A novel multiple kernel fuzzy topic modeling technique for biomedical data'. Together they form a unique fingerprint.

Cite this