A Review on Emotional Speech Databases

Youddha Beer Singh*
Department of Computer Science and Information Technology, KIET Group of Institutions, Delhi-NCR Ghaziabad, India.
Periodicity:September - November'2022
DOI : https://doi.org/10.26634/jcom.10.3.19103

Abstract

Due to its numerous practical applications, human emotion recognition from speech is now a challenging and demanding research subject for scientists. The Speech databases, speech features, and classifiers are the important factors for recognizing emotions from speech. The availability of suitable emotional speech databases is the first step for Speech Emotion Recognition (SER). This paper presents a comprehensive literature review of emotional speech databases. The availability of appropriate emotional speech databases in all emotions and languages are summarized. A total of 26 papers for the emotional speech database have been reviewed. It determines which speech databases are often used and also identifies popular speech databases and the languages in which the majority of the databases are available.

Keywords

Speech Emotion Recognition, Speech Database, Systematic Literature Review.

How to Cite this Article?

Singh, Y. B. (2022). A Review on Emotional Speech Databases. i-manager’s Journal on Computer Science, 10(3), 27-33. https://doi.org/10.26634/jcom.10.3.19103

References

[1]. Adigwe, A., Tits, N., Haddad, K. E., Ostadabbas, S., & Dutoit, T. (2018). The emotional voices database: Towards controlling the emotion dimension in voice generation systems. arXiv preprint arXiv:1806.09514. https://doi.org/10.48550/arXiv.1806.09514
[2]. Bansal, S., & Dev, A. (2013, November). Emotional Hindi speech database. In 2013 International Conference Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) (pp. 1-4). IEEE. https://doi.org/10.1109/ICSDA.2013.6709867
[3]. Bao, W., Li, Y., Gu, M., Yang, M., Li, H., Chao, L., & Tao, J. (2014, October). Building a Chinese natural emotional audio-visual database. In 2014 12th International Conference on Signal Processing (ICSP) (pp. 583-587). IEEE. https://doi.org/10.1109/ICOSP.2014.7015071
[4]. Batliner, A., Buckow, J., Niemann, H., Nöth, E., & Warnke, V. (2000). The prosody module. In Verbmobil: Foundations of Speech-To-Speech Translation (pp. 106-121). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04230-4_8
[5]. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W. F., & Weiss, B. (2005, September). A database of German emotional speech. In Interspeech, 5, 1517-1520.
[6]. Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A., Mower, E., Kim, S., ...& Narayanan, S. S. (2008). IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4), 335-359. https://doi.org/10.1007/s10579-008-9076-6
[7]. Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C., Nenkova, A., & Verma, R. (2014). Crema-d: Crowdsourced emotional multimodal actors dataset. IEEE Transactions on Affective Computing, 5(4), 377-390. https://doi.org/10.1109/TAFFC.2014.2336244
[8]. Chen, J., Wang, C., Wang, K., Yin, C., Zhao, C., Xu, T., ...& Yang, T. (2021). HEU Emotion: a large-scale database for multimodal emotion recognition in the wild. Neural Computing and Applications, 33(14), 8669-8685. https://doi.org/10.1007/s00521-020-05616-w
[9]. Costantini, G., Iaderola, I., Paoloni, A., &Todisco, M. (2014). EMOVO corpus: an Italian emotional speech database. In International Conference on Language Resources and Evaluation (LREC 2014) (pp. 3501-3504).
[10]. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572-587. https://doi.org/10.1016/j.patcog.2010.09.020
[11]. Grimm, M., Kroschel, K., & Narayanan, S. (2008, June). The Vera am mittaggerman audio-visual emotional speech database. In 2008 IEEE International Conference on Multimedia and Expo (pp. 865-868). IEEE. https://doi.org/10.1109/ICME.2008.4607572
[12]. Haq, S., & Jackson, P. J. (2011). Multimodal emotion recognition. In Machine Audition: Principles, Algorithms and Systems (pp. 398-423). IGI Global. https://doi.org/10.4018/978-1-61520-919-4.ch017
[13]. Khanh, T. L. B., Kim, S. H., Lee, G., Yang, H. J., & Baek, E. T. (2021). Korean video dataset for emotion recognition in the wild. Multimedia Tools and Applications, 80(6), 9479-9492. https://doi.org/10.1007/s11042-020-10106-1
[14]. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti, S., & Rao, K. S. (2009, August). IITKGP-SESC: speech database for emotion analysis. In International Conference on Contemporary Computing (pp. 485-492). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_46
[15]. Li, Y., Tao, J., Chao, L., Bao, W., & Liu, Y. (2017). CHEAVD: a Chinese natural emotional audio–visual database. Journal of Ambient Intelligence and Humanized Computing, 8(6), 913-924. https://doi.org/10.1007/s12652-016-0406-z
[16]. Livingstone, S. R., & Russo, F. A. (2018). The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in North American English. Plos One, 13(5), e0196391. https://doi.org/10.1371/journal.pone.0196391
[17]. Lubis, N., Gomez, R., Sakti, S., Nakamura, K., Yoshino, K., Nakamura, S., & Nakadai, K. (2016, May). Construction of Japanese audio-visual emotion database and its application in emotion recognition. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 2180-2184).
[18]. Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006, April). The eNTERFACE'05 audio-visual emotion database. In 22nd International Conference on Data Engineering Workshops (ICDEW'06) (pp. 8-8). IEEE. https://doi.org/10.1109/ICDEW.2006.145
[19]. Meftah, A. H., Qamhan, M. A., Seddiq, Y., Alotaibi, Y. A., & Selouani, S. A. (2021). King saud university emotions corpus: Construction, analysis, evaluation, and comparison. IEEE Access, 9, 54201-54219. https://doi.org/10.1109/ACCESS.2021.3070751
[20]. Parada-Cabaleiro, E., Costantini, G., Batliner, A., Baird, A., & Schuller, B. (2018). Categorical vs dimensional perception of Italian emotional speech. Interspeech, 3638-3642. https://doi.org/10.5281/zenodo.1326428.
[21]. Parada-Cabaleiro, E., Costantini, G., Batliner, A., Schmitt, M., & Schuller, B. W. (2020). DEMoS: An Italian emotional speech corpus. Language Resources and Evaluation, 54(2), 341-383. https://doi.org/10.1007/s10579-019-09450-y
[22]. Pichora-Fuller, M. K., & Dupuis, K. (2020). Toronto emotional speech set (TESS). Scholars Portal Dataverse, 1, 2020. https://doi.org/10.5683/SP2/E8H2MF
[23]. Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., & Mihalcea, R. (2018). Meld: A multimodal multi-party dataset for emotion recognition in conversations. arXiv preprint arXiv:1810.02508. https://doi.org/10.48550/arXiv.1810.02508
[24]. Rambabu, B., Botsa, K. K., Paidi, G., & Gangashetty, S. V. (2020, May). IIIT-H TEMD semi-natural emotional speech database from professional actors and non actors. In Proceedings of the 12th Language Resources and Evaluation Conference (pp. 1538-1545).
[25]. Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech Communication, 48(9), 1162-1181. https://doi.org/10.1016/j.specom.2006. 04.003
[26]. Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustical correlates. The Journal of the Acoustical Society of America, 52(4B), 1238-1250. https://doi.org/10.1121/1.1913238
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.