TY - GEN
T1 - BEDSpell
T2 - Workshops on ASOCA, AI-PA, FMCIoT, WESOACS 2022, held in Conjunction with the 20th International Conference on Service-Oriented Computing, ICSOC 2022
AU - Tohidian, Fatemeh
AU - Kashiri, Amin
AU - Lotfi, Fariba
PY - 2023
Y1 - 2023
N2 - The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies.
AB - The spelling correction problem, the task of automatically correcting misspellings in a text, is critical in natural language processing (NLP). Although it can be considered a standalone task, in most cases, it is an integral component of various NLP tasks as a preprocessing step since a dataset with typos can lead to erroneous results. Many previous automatic spelling correctors use a dictionary, independently search the word in a predefined list of words, and recommend the most similar one without considering the context. Even though these models’ output may be a correctly spelled word, it could be semantically incorrect. Therefore, some correctors consider the context when correcting typos based on language models. However, only employing the language model is insufficient, and the corrected word should be similar to the misspelled word. In our approach, we select a candidate for the typo based on masked language model output, character-level similarities, and edit distance. Exploiting the combination of the masked language model, character-level similarities, and edit distance assists us in recommending similar context-related candidates. We have used recall (correction rate) as our evaluation metric, and the results demonstrate a considerable improvement compared with previous studies.
KW - Spelling correction
KW - Natural language processing
KW - Preprocessing
KW - Dictionary
KW - Masked language model
KW - Edit distance
UR - https://www.scopus.com/pages/publications/85151059292
U2 - 10.1007/978-3-031-26507-5_1
DO - 10.1007/978-3-031-26507-5_1
M3 - Conference proceeding contribution
AN - SCOPUS:85151059292
SN - 9783031265068
T3 - Lecture Notes in Computer Science
SP - 3
EP - 14
BT - Service-Oriented Computing – ICSOC 2022 Workshops
A2 - Troya, Javier
A2 - Mirandola, Raffaela
A2 - Navarro, Elena
A2 - Delgado, Andrea
A2 - Segura, Sergio
A2 - Ortiz, Guadalupe
A2 - Pautasso, Cesare
A2 - Zirpins, Christian
A2 - Fernández, Pablo
A2 - Ruiz-Cortés, Antonio
PB - Springer, Springer Nature
CY - Cham
Y2 - 29 November 2022 through 2 December 2022
ER -