Incorporating embedding to topic modeling for more effective short text analysis

Junaid Rashid, Jungeun Kim*, Usman Naseem

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

With the growing abundance of short text content on websites, analyzing and comprehending these short texts has become a crucial task. Topic modeling is a widely used technique for analyzing short text documents and uncovering the underlying topics. However, traditional topic models face difficulties in accurately extracting topics from short texts due to limited content and their sparse nature. To address these issues, we propose an Embedding-based topic modeling (EmTM) approach that incorporates word embedding and hierarchical clustering to identify significant topics. Experimental results demonstrate the effectiveness of EmTM on two datasets comprising web short texts, Snippet and News. The results indicate a superiority of EmTM over baseline topic models by its exceptional performance in both classification accuracy and topic coherence metrics.

Original languageEnglish
Title of host publicationThe ACM Web Conference 2023
Subtitle of host publicationCompanion of The World Wide Web Conference WWW 2023
Place of PublicationNew York
PublisherAssociation for Computing Machinery
Pages73-76
Number of pages4
ISBN (Electronic)9781450394192
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event2023 World Wide Web Conference, WWW 2023 - Austin, United States
Duration: 30 Apr 20234 May 2023

Conference

Conference2023 World Wide Web Conference, WWW 2023
Country/TerritoryUnited States
CityAustin
Period30/04/234/05/23

Keywords

  • Topic Modeling
  • Clustering
  • Short Text
  • Classifcation
  • Coherence

Fingerprint

Dive into the research topics of 'Incorporating embedding to topic modeling for more effective short text analysis'. Together they form a unique fingerprint.

Cite this