A two-stage text mining model for information filtering

Yuefeng Li, Xujuan Zhou, Peter Bruza, Yue Xu, Raymond Y. K. Lau

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

30 Citations (Scopus)

Abstract

Mismatch and overload are the two fundamental issues regarding the effectiveness of information filtering. Both term-based and pattern (phrase) based approaches have been employed to address these issues. However, they all suffer from some limitations with regard to effectiveness. This paper proposes a novel solution that includes two stages: an initial topic filtering stage followed by a stage involving pattern taxonomy mining. The objective of the first stage is to address mismatch by quickly filtering out probable irrelevant documents. The threshold used in the first stage is motivated theoretically. The objective of the second stage is to address overload by apply pattern mining techniques to rationalize the data relevance of the reduced document set after the first stage. Substantial experiments on RCV1 show that the proposed solution achieves encouraging performance.
Original languageEnglish
Title of host publicationCIKM'08
Subtitle of host publicationProceedings of the 17th ACM conference on Information and knowledge management
Place of PublicationNew York, NY
PublisherAssociation for Computing Machinery
Pages1023-1032
Number of pages10
ISBN (Print)9781595939913
DOIs
Publication statusPublished - 2008
EventACM Conference on Information and Knowledge Management (17th : 2008) - Napa Valley, CA
Duration: 26 Oct 200830 Oct 2008

Conference

ConferenceACM Conference on Information and Knowledge Management (17th : 2008)
CityNapa Valley, CA
Period26/10/0830/10/08

Keywords

  • decision rules
  • information filtering
  • text mining
  • thresholds
  • weighting schema

Fingerprint

Dive into the research topics of 'A two-stage text mining model for information filtering'. Together they form a unique fingerprint.

Cite this