Skip to main navigation Skip to search Skip to main content

BEDSpell: spelling error correction using BERT-based masked language model and edit distance

Fatemeh Tohidian*, Amin Kashiri, Fariba Lotfi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies.

Original languageEnglish
Title of host publicationService-Oriented Computing – ICSOC 2022 Workshops
Subtitle of host publicationASOCA, AI-PA, FMCIoT, WESOACS 2022, Sevilla, Spain, November 29 – December 2, 2022 proceedings
EditorsJavier Troya, Raffaela Mirandola, Elena Navarro, Andrea Delgado, Sergio Segura, Guadalupe Ortiz, Cesare Pautasso, Christian Zirpins, Pablo Fernández, Antonio Ruiz-Cortés
Place of PublicationCham
PublisherSpringer, Springer Nature
Pages3-14
Number of pages12
ISBN (Electronic)9783031265075
ISBN (Print)9783031265068
DOIs
Publication statusPublished - 2023
EventWorkshops on ASOCA, AI-PA, FMCIoT, WESOACS 2022, held in Conjunction with the 20th International Conference on Service-Oriented Computing, ICSOC 2022 - Seville, Spain
Duration: 29 Nov 20222 Dec 2022

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume13821
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceWorkshops on ASOCA, AI-PA, FMCIoT, WESOACS 2022, held in Conjunction with the 20th International Conference on Service-Oriented Computing, ICSOC 2022
Country/TerritorySpain
CitySeville
Period29/11/222/12/22

Keywords

  • Spelling correction
  • Natural language processing
  • Preprocessing
  • Dictionary
  • Masked language model
  • Edit distance

Fingerprint

Dive into the research topics of 'BEDSpell: spelling error correction using BERT-based masked language model and edit distance'. Together they form a unique fingerprint.

Cite this