Abstract
Keyword search is widely used in many practical applications. Unfortunately, most keyword-based search engines compute the similarity distance between two Web documents by only matching the keywords at the same positions in both the query and the document vectors, without considering the impact of the keywords at neighbouring positions. Such approach usually results in incompleteness of search results. In this paper, we exploit the Earth Mover’s Distance (EMD) as a distance function, which is more flexible against other distance functions such as Euclidean distance. To overcome the limitation of EMD-based computation complexity, we use the filtering techniques to minimize the total number of actual EMD computations. We further develop a novel lower bound as a new EMD filter for partial matching technique that is suitable for searching Web documents. The experimental results demonstrate the efficiency of EMD-based search with filtering techniques.
Original language | English |
---|---|
Pages (from-to) | 256-265 |
Number of pages | 10 |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 8786 |
DOIs | |
Publication status | Published - 2014 |
Externally published | Yes |
Event | 15th International Conference on Web Information Systems Engineering: WISE 2014 - Thessaloniki, Greece Duration: 12 Oct 2014 → 14 Oct 2014 |
Keywords
- EMD
- Partial matching
- Web-based similarity search