WETM: a word embedding-based topic model with modified collapsed Gibbs sampling for short text

Junaid Rashid*, Jungeun Kim, Amir Hussain, Usman Naseem

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Short texts are a common source of knowledge, and the extraction of such valuable information is beneficial for several purposes. Traditional topic models are incapable of analyzing the internal structural information of topics. They are mostly based on the co-occurrence of words at the document level and are often unable to extract semantically relevant topics from short text datasets due to their limited length. Although some traditional topic models are sensitive to word order due to the strong sparsity of data, they do not perform well on short texts. In this paper, we propose a novel word embedding-based topic model (WETM) for short text documents to discover the structural information of topics and words and eliminate the sparsity problem. Moreover, a modified collapsed Gibbs sampling algorithm is proposed to strengthen the semantic coherence of topics in short texts. WETM extracts semantically coherent topics from short texts and finds relationships between words. Extensive experimental results on two real-world datasets show that WETM achieves better topic quality, topic coherence, classification, and clustering results. WETM also requires less execution time compared to traditional topic models.

Original languageEnglish
Pages (from-to)158-164
Number of pages7
JournalPattern Recognition Letters
Volume172
Early online date8 Jun 2023
DOIs
Publication statusPublished - Aug 2023
Externally publishedYes

Keywords

  • Topi modeling
  • Short text
  • Classification
  • Topic coherence

Fingerprint

Dive into the research topics of 'WETM: a word embedding-based topic model with modified collapsed Gibbs sampling for short text'. Together they form a unique fingerprint.

Cite this