A latent semantic indexing and WordNet based information retrieval model for digital forensies

Lan Du*, Huidong Jin, Olivier De Vel, Nianjun Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

9 Citations (Scopus)

Abstract

It is well known that either domain specific or domain independent knowledge has been adopted in Information Retrieval (IR) to improve the retrieval performance. In this paper, we propose a novel IR model for digital forensics by using Latent Semantic Indexing (LSI) and WordNet as an underlying reference ontology to retrieve suspicious emails according to the semantic meaning of an investigator's query. Our model incorporates corpus independent knowledge from WordNet and corpus dependent knowledge from LSI into query expansion and reduction; and LSI is also adopted to simulate human meaning-based judgement of relatedness between investigator's queries and emails. We compare the performance of the resulting LSI And WordNet based Information Retrieval System (LAWIRS) with other three systems we implement, i.e. the LSI system, the Lucene system and the Lucene system with query expansion. Experimental results on several email datasets demonstrate that for short Boolean queries, LAWIRS can successfully capture their meaning and yield substantial improvements in the overall retrieval performance.

Original languageEnglish
Title of host publicationIEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008
Pages70-75
Number of pages6
DOIs
Publication statusPublished - 2008
EventIEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008 - Taipei, Taiwan, Province of China
Duration: 17 Jun 200820 Jun 2008

Other

OtherIEEE International Conference on Intelligence and Security Informatics, 2008, IEEE ISI 2008
CountryTaiwan, Province of China
CityTaipei
Period17/06/0820/06/08

Fingerprint Dive into the research topics of 'A latent semantic indexing and WordNet based information retrieval model for digital forensies'. Together they form a unique fingerprint.

Cite this