Abstract
With approximately 400 million speakers worldwide, Arabic ranks as the fifth most-spoken language globally, necessitating advancements in natural language processing. This paper describes the approaches employed for the subtasks outlined in the Nuanced Arabic Dialect Identification (NADI) task at EMNLP 2023. We employ an ensemble of two Arabic language models for the first subtask involving closed country-level dialect identification classification. Similarly, for the second subtask, focused on closed dialect to Modern Standard Arabic (MSA) machine translation, our approach combines sequence-to-sequence models trained on an Arabic-specific dataset. Our team ranks 10th and 3rd on subtask 1 and subtask 2, respectively.
Original language | English |
---|---|
Title of host publication | Proceedings of ArabicNLP 2023 |
Place of Publication | Stroudsburg |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 614-619 |
Number of pages | 6 |
ISBN (Electronic) | 9781959429272 |
DOIs | |
Publication status | Published - 2023 |
Externally published | Yes |
Event | 1st Arabic Natural Language Processing Conference, ArabicNLP 2023 - Hybrid, Singapore, Singapore Duration: 7 Dec 2023 → 7 Dec 2023 |
Conference
Conference | 1st Arabic Natural Language Processing Conference, ArabicNLP 2023 |
---|---|
Country/Territory | Singapore |
City | Hybrid, Singapore |
Period | 7/12/23 → 7/12/23 |