Fast phonetic similarity search over large repositories

Hegler Tissot, Gabriel Peschl, Marcos Didonet Del Fabro

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

5 Citations (Scopus)

Abstract

Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.

Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications
Subtitle of host publication25th International Conference, DEXA 2014, Munich, Germany, September 1-4, 2014, Proceedings, Part II
EditorsHendrik Decker, Lenka Lhotská, Sebastian Link, Marcus Spies, Roland R. Wagner
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Pages74-81
Number of pages8
ISBN (Print)9783319100845
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event25th International Conference on Database and Expert Systems Applications, DEXA 2014 - Munich, Germany
Duration: 1 Sept 20144 Sept 2014

Publication series

NameLecture Notes in Computer Science
Volume8645
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other25th International Conference on Database and Expert Systems Applications, DEXA 2014
Country/TerritoryGermany
CityMunich
Period1/09/144/09/14

Keywords

  • Fast Search
  • Phonetic Similarity
  • String Similarity

Fingerprint

Dive into the research topics of 'Fast phonetic similarity search over large repositories'. Together they form a unique fingerprint.

Cite this