Keyword search over web documents based on Earth Mover’s Distance

Jiangang Ma, Quan Z. Sheng, Lina Yao, Yong Xu, Ali Shemshadi

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

Keyword search is widely used in many practical applications. Unfortunately, most keyword-based search engines compute the similarity distance between two Web documents by only matching the keywords at the same positions in both the query and the document vectors, without considering the impact of the keywords at neighbouring positions. Such approach usually results in incompleteness of search results. In this paper, we exploit the Earth Mover’s Distance (EMD) as a distance function, which is more flexible against other distance functions such as Euclidean distance. To overcome the limitation of EMD-based computation complexity, we use the filtering techniques to minimize the total number of actual EMD computations. We further develop a novel lower bound as a new EMD filter for partial matching technique that is suitable for searching Web documents. The experimental results demonstrate the efficiency of EMD-based search with filtering techniques.

Original languageEnglish
Pages (from-to)256-265
Number of pages10
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8786
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event15th International Conference on Web Information Systems Engineering: WISE 2014 - Thessaloniki, Greece
Duration: 12 Oct 201414 Oct 2014

Keywords

  • EMD
  • Partial matching
  • Web-based similarity search

Fingerprint

Dive into the research topics of 'Keyword search over web documents based on Earth Mover’s Distance'. Together they form a unique fingerprint.

Cite this