ShEMO: a large-scale validated database for Persian speech emotion detection

Research output: Contribution to journalArticleResearchpeer-review

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female = 59.4%, male = 57.6%). The ShEMO will be available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.
LanguageEnglish
Pages1-16
Number of pages16
JournalLanguage Resources and Evaluation
Volume53
Issue number1
DOIs
Publication statusPublished - 15 Mar 2019
Externally publishedYes

Fingerprint

emotion
radio play
gender
Emotion
Data Base
happiness
anger
voting
anxiety
present
experiment
Utterance

Keywords

  • Benchmark
  • Emotion detection
  • Emotional speech
  • Persian
  • Speech database

Cite this

@article{920d60d7705e46b1948c724d2bf8c9be,
title = "ShEMO: a large-scale validated database for Persian speech emotion detection",
abstract = "This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64{\%} which is interpreted as “substantial agreement”. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2{\%}) and gender-dependent models (female = 59.4{\%}, male = 57.6{\%}). The ShEMO will be available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.",
keywords = "Benchmark, Emotion detection, Emotional speech, Persian, Speech database",
author = "{Mohamad Nezami}, Omid and {Jamshid Lou}, Paria and Mansoureh Karami",
year = "2019",
month = "3",
day = "15",
doi = "10.1007/s10579-018-9427-x",
language = "English",
volume = "53",
pages = "1--16",
journal = "Language Resources and Evaluation",
issn = "1574-020X",
publisher = "Springer, Springer Nature",
number = "1",

}

ShEMO : a large-scale validated database for Persian speech emotion detection. / Mohamad Nezami, Omid; Jamshid Lou, Paria; Karami, Mansoureh.

In: Language Resources and Evaluation, Vol. 53, No. 1, 15.03.2019, p. 1-16.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - ShEMO

T2 - Language Resources and Evaluation

AU - Mohamad Nezami, Omid

AU - Jamshid Lou, Paria

AU - Karami, Mansoureh

PY - 2019/3/15

Y1 - 2019/3/15

N2 - This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female = 59.4%, male = 57.6%). The ShEMO will be available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.

AB - This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results based on common classification methods in speech emotion detection task. According to the experiments, support vector machine achieves the best results for both gender-independent (58.2%) and gender-dependent models (female = 59.4%, male = 57.6%). The ShEMO will be available for academic purposes free of charge to provide a baseline for further research on Persian emotional speech.

KW - Benchmark

KW - Emotion detection

KW - Emotional speech

KW - Persian

KW - Speech database

UR - http://www.scopus.com/inward/record.url?scp=85054884198&partnerID=8YFLogxK

U2 - 10.1007/s10579-018-9427-x

DO - 10.1007/s10579-018-9427-x

M3 - Article

VL - 53

SP - 1

EP - 16

JO - Language Resources and Evaluation

JF - Language Resources and Evaluation

SN - 1574-020X

IS - 1

ER -