DialectNLU at NADI 2023 shared task: transformer based multitask approach jointly integrating dialect and machine translation tasks in Arabic

Hariram Veeramani, Surendrabikram Thapa, Usman Naseem

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

5 Citations (Scopus)

Abstract

With approximately 400 million speakers worldwide, Arabic ranks as the fifth most-spoken language globally, necessitating advancements in natural language processing. This paper describes the approaches employed for the subtasks outlined in the Nuanced Arabic Dialect Identification (NADI) task at EMNLP 2023. We employ an ensemble of two Arabic language models for the first subtask involving closed country-level dialect identification classification. Similarly, for the second subtask, focused on closed dialect to Modern Standard Arabic (MSA) machine translation, our approach combines sequence-to-sequence models trained on an Arabic-specific dataset. Our team ranks 10th and 3rd on subtask 1 and subtask 2, respectively.

Original languageEnglish
Title of host publicationProceedings of ArabicNLP 2023
Place of PublicationStroudsburg
PublisherAssociation for Computational Linguistics (ACL)
Pages614-619
Number of pages6
ISBN (Electronic)9781959429272
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event1st Arabic Natural Language Processing Conference, ArabicNLP 2023 - Hybrid, Singapore, Singapore
Duration: 7 Dec 20237 Dec 2023

Conference

Conference1st Arabic Natural Language Processing Conference, ArabicNLP 2023
Country/TerritorySingapore
CityHybrid, Singapore
Period7/12/237/12/23

Fingerprint

Dive into the research topics of 'DialectNLU at NADI 2023 shared task: transformer based multitask approach jointly integrating dialect and machine translation tasks in Arabic'. Together they form a unique fingerprint.

Cite this