Learning heterogeneous coupling relationships between non-iid terms

Mu Li*, Jinjiu Li, Yuming Ou, Ya Zhang, Dan Luo, Maninder Bahtia, Longbing Cao

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

Abstract

With the rapid proliferation of social media and online community, a vast amount of text data has been generated. Discovering the insightful value of the text data has increased its importance, a variety of text mining and process algorithms have been created in the recent years such as classification, clustering, similarity comparison. Most previous research uses a vector-space model for text representation and analysis. However, the vector-space model does not utilise the information about the relationships between the term to term. Moreover, the classic classification methods also ignore the relationships between each text document to another. In other word, the traditional text mining techniques assume the relation between terms and between documents are independent and identically distributed (iid). In this paper, we will introduce a novel term representation by involving the coupled relations from term to term. This coupled representation provides much richer information that enables us to create a coupled similarity metric for measuring document similarity, and a coupled document similarity based K-Nearest centroid classifier will be applied to the classification task. Experiments verify the proposed approach outperforming the classic vector-space based classifier, and show potential advantages and richness in exploring the other text mining tasks.

Original languageEnglish
Title of host publicationAgents and Data Mining Interaction
Subtitle of host publication9th International Workshop, ADMI 2013, Saint Paul, MN, USA, May 6-7, 2013, revised selected papers
EditorsLongbing Cao, Yifeng Zeng, Andreas L. Symeonidis, Vladimir Gorodetsky, Jörg P. Müller, Philip S. Yu
Place of PublicationBerlin
PublisherSpringer, Springer Nature
Pages79-91
Number of pages13
ISBN (Electronic)9783642551925
ISBN (Print)9783642551918
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event9th International Workshop on Agents and Data Mining Interaction, ADMI 2013 - Saint Paul, MN, United States
Duration: 6 May 20137 May 2013

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume8316
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Workshop on Agents and Data Mining Interaction, ADMI 2013
Country/TerritoryUnited States
CitySaint Paul, MN
Period6/05/137/05/13

Keywords

  • Non-iid
  • Coupled similarity
  • Vector representation

Fingerprint

Dive into the research topics of 'Learning heterogeneous coupling relationships between non-iid terms'. Together they form a unique fingerprint.

Cite this