Fast phonetic similarity search over large repositories

Hegler Tissot, Gabriel Peschl, Marcos Didonet Del Fabro

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contribution

4 Citations (Scopus)

Abstract

Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors.

Original languageEnglish
Title of host publicationDatabase and Expert Systems Applications
Subtitle of host publication25th International Conference, DEXA 2014, Munich, Germany, September 1-4, 2014, Proceedings, Part II
EditorsHendrik Decker, Lenka Lhotská, Sebastian Link, Marcus Spies, Roland R. Wagner
Place of PublicationCham, Switzerland
PublisherSpringer, Springer Nature
Pages74-81
Number of pages8
ISBN (Print)9783319100845
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event25th International Conference on Database and Expert Systems Applications, DEXA 2014 - Munich, Germany
Duration: 1 Sep 20144 Sep 2014

Publication series

NameLecture Notes in Computer Science
Volume8645
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other25th International Conference on Database and Expert Systems Applications, DEXA 2014
CountryGermany
CityMunich
Period1/09/144/09/14

Keywords

  • Fast Search
  • Phonetic Similarity
  • String Similarity

Fingerprint Dive into the research topics of 'Fast phonetic similarity search over large repositories'. Together they form a unique fingerprint.

Cite this