Two-stage model for information filtering

Xujuan Zhou, Yuefeng Li, Peter Bruza, Yue Xu, Raymond Y. K. Lau

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

Abstract

This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF) and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant information based on term-based profiles. Thus, only a relatively small amount of potentially highly relevant documents remain for document ranking. The second stage of the presented method uses pattern mining approach. The objective of the second stage is to solve the problem of information overload. The most likely relevant documents were assigned higher ranks by exploiting patterns in the pattern taxonomy. The second stage is precision oriented. Since relatively small amount of documents are involved at this stage, computational cost is markedly reduced, at the same time, with significant improved results. The new two-stage information filtering model has been evaluated by extensive experiments. The tests were based on well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely Reuters Corpus Volume 1 (RCV1). The performance of the new model was compared with both of the term-based and data miningbased IF models. The results show that more effective and efficient information access has been achieved by combining the strength of information filtering and data mining method.
LanguageEnglish
Title of host publicationWI-IAT 2008
Subtitle of host publicationIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology : proceedings
Place of PublicationPiscataway, NJ
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages685-689
Number of pages5
ISBN (Print)9780769534961
DOIs
Publication statusPublished - 2008
Externally publishedYes
EventIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Sydney, NSW
Duration: 9 Dec 200812 Dec 2008

Conference

ConferenceIEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
CitySydney, NSW
Period9/12/0812/12/08

Fingerprint

Information filtering
Data mining
Taxonomies
Information retrieval
Learning systems
Costs
Experiments

Cite this

Zhou, X., Li, Y., Bruza, P., Xu, Y., & Lau, R. Y. K. (2008). Two-stage model for information filtering. In WI-IAT 2008: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology : proceedings (pp. 685-689). Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/WIIAT.2008.390
Zhou, Xujuan ; Li, Yuefeng ; Bruza, Peter ; Xu, Yue ; Lau, Raymond Y. K. / Two-stage model for information filtering. WI-IAT 2008: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology : proceedings. Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE), 2008. pp. 685-689
@inproceedings{80d5b8db46f7412c889a8b56d1c09166,
title = "Two-stage model for information filtering",
abstract = "This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF) and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant information based on term-based profiles. Thus, only a relatively small amount of potentially highly relevant documents remain for document ranking. The second stage of the presented method uses pattern mining approach. The objective of the second stage is to solve the problem of information overload. The most likely relevant documents were assigned higher ranks by exploiting patterns in the pattern taxonomy. The second stage is precision oriented. Since relatively small amount of documents are involved at this stage, computational cost is markedly reduced, at the same time, with significant improved results. The new two-stage information filtering model has been evaluated by extensive experiments. The tests were based on well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely Reuters Corpus Volume 1 (RCV1). The performance of the new model was compared with both of the term-based and data miningbased IF models. The results show that more effective and efficient information access has been achieved by combining the strength of information filtering and data mining method.",
author = "Xujuan Zhou and Yuefeng Li and Peter Bruza and Yue Xu and Lau, {Raymond Y. K.}",
year = "2008",
doi = "10.1109/WIIAT.2008.390",
language = "English",
isbn = "9780769534961",
pages = "685--689",
booktitle = "WI-IAT 2008",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",
address = "United States",

}

Zhou, X, Li, Y, Bruza, P, Xu, Y & Lau, RYK 2008, Two-stage model for information filtering. in WI-IAT 2008: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology : proceedings. Institute of Electrical and Electronics Engineers (IEEE), Piscataway, NJ, pp. 685-689, IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Sydney, NSW, 9/12/08. https://doi.org/10.1109/WIIAT.2008.390

Two-stage model for information filtering. / Zhou, Xujuan; Li, Yuefeng; Bruza, Peter; Xu, Yue; Lau, Raymond Y. K.

WI-IAT 2008: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology : proceedings. Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE), 2008. p. 685-689.

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionResearchpeer-review

TY - GEN

T1 - Two-stage model for information filtering

AU - Zhou, Xujuan

AU - Li, Yuefeng

AU - Bruza, Peter

AU - Xu, Yue

AU - Lau, Raymond Y. K.

PY - 2008

Y1 - 2008

N2 - This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF) and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant information based on term-based profiles. Thus, only a relatively small amount of potentially highly relevant documents remain for document ranking. The second stage of the presented method uses pattern mining approach. The objective of the second stage is to solve the problem of information overload. The most likely relevant documents were assigned higher ranks by exploiting patterns in the pattern taxonomy. The second stage is precision oriented. Since relatively small amount of documents are involved at this stage, computational cost is markedly reduced, at the same time, with significant improved results. The new two-stage information filtering model has been evaluated by extensive experiments. The tests were based on well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely Reuters Corpus Volume 1 (RCV1). The performance of the new model was compared with both of the term-based and data miningbased IF models. The results show that more effective and efficient information access has been achieved by combining the strength of information filtering and data mining method.

AB - This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF) and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant information based on term-based profiles. Thus, only a relatively small amount of potentially highly relevant documents remain for document ranking. The second stage of the presented method uses pattern mining approach. The objective of the second stage is to solve the problem of information overload. The most likely relevant documents were assigned higher ranks by exploiting patterns in the pattern taxonomy. The second stage is precision oriented. Since relatively small amount of documents are involved at this stage, computational cost is markedly reduced, at the same time, with significant improved results. The new two-stage information filtering model has been evaluated by extensive experiments. The tests were based on well-known IR bench-marking processes, using the latest version of the Reuters dataset, namely Reuters Corpus Volume 1 (RCV1). The performance of the new model was compared with both of the term-based and data miningbased IF models. The results show that more effective and efficient information access has been achieved by combining the strength of information filtering and data mining method.

U2 - 10.1109/WIIAT.2008.390

DO - 10.1109/WIIAT.2008.390

M3 - Conference proceeding contribution

SN - 9780769534961

SP - 685

EP - 689

BT - WI-IAT 2008

PB - Institute of Electrical and Electronics Engineers (IEEE)

CY - Piscataway, NJ

ER -

Zhou X, Li Y, Bruza P, Xu Y, Lau RYK. Two-stage model for information filtering. In WI-IAT 2008: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology : proceedings. Piscataway, NJ: Institute of Electrical and Electronics Engineers (IEEE). 2008. p. 685-689 https://doi.org/10.1109/WIIAT.2008.390