TY - JOUR
T1 - Tanz-indicator
T2 - a novel framework for detection of Perso-Arabic-Scripted Urdu sarcastic opinions
AU - Gul, Shabana
AU - Khan, Rafi Ullah
AU - Ullah, Mohib
AU - Aftab, Roman
AU - Waheed, Abdul
AU - Wu, Tsu-Yang
N1 - Copyright the Author(s) 2022. Version archived for private and non-commercial use with the permission of the author/s and according to publisher conditions. For further rights please contact the publisher.
PY - 2022
Y1 - 2022
N2 - Automatic sarcasm detection in textual data is a crucial task in sentiment analysis. This problem is complex because sarcastic comments usually carry the opposite meaning and are context-driven. The issue of sarcasm detection in comments written in Perso-Arabic-scripted Urdu text is even more challenging due to limited online linguistic resources. In this research, we proposed Tanz-Indicator, a lexicon-based framework to detect sarcasm in the user comments posted in Perso-Arabic Urdu language. We use a lexicon of over 3000 sarcastic tweets and 100 sarcastic features for experimentation. We also train two machine learning models with the same data to compare the performance of the lexicon-based model and machine learning-based model. The results show that the lexicon-based model correctly identified 48.5% sarcastic and 23.5% nonsarcastic tweets with the recall of 69.6% and 87.9% precision. The recall rate of Naïve Bayes and SVM-based machine learning models was 20.1% and 24.4%, respectively, with an overall accuracy of 65.2% and 60.1%, respectively.
AB - Automatic sarcasm detection in textual data is a crucial task in sentiment analysis. This problem is complex because sarcastic comments usually carry the opposite meaning and are context-driven. The issue of sarcasm detection in comments written in Perso-Arabic-scripted Urdu text is even more challenging due to limited online linguistic resources. In this research, we proposed Tanz-Indicator, a lexicon-based framework to detect sarcasm in the user comments posted in Perso-Arabic Urdu language. We use a lexicon of over 3000 sarcastic tweets and 100 sarcastic features for experimentation. We also train two machine learning models with the same data to compare the performance of the lexicon-based model and machine learning-based model. The results show that the lexicon-based model correctly identified 48.5% sarcastic and 23.5% nonsarcastic tweets with the recall of 69.6% and 87.9% precision. The recall rate of Naïve Bayes and SVM-based machine learning models was 20.1% and 24.4%, respectively, with an overall accuracy of 65.2% and 60.1%, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85135606205&partnerID=8YFLogxK
U2 - 10.1155/2022/9151890
DO - 10.1155/2022/9151890
M3 - Article
SN - 1530-8669
VL - 2022
SP - 1
EP - 9
JO - Wireless Communications and Mobile Computing
JF - Wireless Communications and Mobile Computing
M1 - 9151890
ER -