TY - JOUR
T1 - USAD
T2 - an intelligent system for slang and abusive text detection in PERSO-arabic-scripted urdu
AU - Haq, Nauman Ul
AU - Ullah, Mohib
AU - Khan, Rafiullah
AU - Ahmad, Arshad
AU - Almogren, Ahmad
AU - Hayat, Bashir
AU - Shafi, Bushra
N1 - Copyright the Author(s) 2020. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2020
Y1 - 2020
N2 - The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.
AB - The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.
UR - http://www.scopus.com/inward/record.url?scp=85097850424&partnerID=8YFLogxK
U2 - 10.1155/2020/6684995
DO - 10.1155/2020/6684995
M3 - Article
SN - 1076-2787
VL - 2020
SP - 1
EP - 7
JO - Complexity
JF - Complexity
M1 - 6684995
ER -