References
[1]. Adigwe, A., Tits, N., Haddad, K. E., Ostadabbas, S., &
Dutoit, T. (2018). The emotional voices database: Towards
controlling the emotion dimension in voice generation
systems. arXiv preprint arXiv:1806.09514. https://doi.org/10.48550/arXiv.1806.09514
[2]. Bansal, S., & Dev, A. (2013, November). Emotional
Hindi speech database. In 2013 International
Conference Oriental COCOSDA Held Jointly with 2013
Conference on Asian Spoken Language Research and
Evaluation (O-COCOSDA/CASLRE) (pp. 1-4). IEEE. https://doi.org/10.1109/ICSDA.2013.6709867
[3]. Bao, W., Li, Y., Gu, M., Yang, M., Li, H., Chao, L., & Tao,
J. (2014, October). Building a Chinese natural emotional
audio-visual database. In 2014 12th International
Conference on Signal Processing (ICSP) (pp. 583-587).
IEEE. https://doi.org/10.1109/ICOSP.2014.7015071
[4]. Batliner, A., Buckow, J., Niemann, H., Nöth, E., &
Warnke, V. (2000). The prosody module. In Verbmobil:
Foundations of Speech-To-Speech Translation (pp. 106-121). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-04230-4_8
[5]. Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W.
F., & Weiss, B. (2005, September). A database of German
emotional speech. In Interspeech, 5, 1517-1520.
[6]. Busso, C., Bulut, M., Lee, C. C., Kazemzadeh, A.,
Mower, E., Kim, S., ...& Narayanan, S. S. (2008). IEMOCAP:
Interactive emotional dyadic motion capture database.
Language Resources and Evaluation, 42(4), 335-359.
https://doi.org/10.1007/s10579-008-9076-6
[7]. Cao, H., Cooper, D. G., Keutmann, M. K., Gur, R. C.,
Nenkova, A., & Verma, R. (2014). Crema-d: Crowdsourced
emotional multimodal actors dataset. IEEE
Transactions on Affective Computing, 5(4), 377-390.
https://doi.org/10.1109/TAFFC.2014.2336244
[8]. Chen, J., Wang, C., Wang, K., Yin, C., Zhao, C., Xu, T.,
...& Yang, T. (2021). HEU Emotion: a large-scale database
for multimodal emotion recognition in the wild. Neural
Computing and Applications, 33(14), 8669-8685.
https://doi.org/10.1007/s00521-020-05616-w
[9]. Costantini, G., Iaderola, I., Paoloni, A., &Todisco, M.
(2014). EMOVO corpus: an Italian emotional speech
database. In International Conference on Language
Resources and Evaluation (LREC 2014) (pp. 3501-3504).
[10]. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey
on speech emotion recognition: Features, classification
schemes, and databases. Pattern Recognition, 44(3),
572-587. https://doi.org/10.1016/j.patcog.2010.09.020
[11]. Grimm, M., Kroschel, K., & Narayanan, S. (2008,
June). The Vera am mittaggerman audio-visual
emotional speech database. In 2008 IEEE International
Conference on Multimedia and Expo (pp. 865-868). IEEE.
https://doi.org/10.1109/ICME.2008.4607572
[12]. Haq, S., & Jackson, P. J. (2011). Multimodal emotion
recognition. In Machine Audition: Principles, Algorithms
and Systems (pp. 398-423). IGI Global. https://doi.org/10.4018/978-1-61520-919-4.ch017
[13]. Khanh, T. L. B., Kim, S. H., Lee, G., Yang, H. J., & Baek,
E. T. (2021). Korean video dataset for emotion recognition
in the wild. Multimedia Tools and Applications, 80(6), 9479-9492. https://doi.org/10.1007/s11042-020-10106-1
[14]. Koolagudi, S. G., Maity, S., Kumar, V. A., Chakrabarti,
S., & Rao, K. S. (2009, August). IITKGP-SESC: speech
database for emotion analysis. In International
Conference on Contemporary Computing (pp. 485-492). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03547-0_46
[15]. Li, Y., Tao, J., Chao, L., Bao, W., & Liu, Y. (2017).
CHEAVD: a Chinese natural emotional audio–visual
database. Journal of Ambient Intelligence and
Humanized Computing, 8(6), 913-924. https://doi.org/10.1007/s12652-016-0406-z
[16]. Livingstone, S. R., & Russo, F. A. (2018). The ryerson
audio-visual database of emotional speech and song
(ravdess): a dynamic, multimodal set of facial and vocal
expressions in North American English. Plos One, 13(5), e0196391. https://doi.org/10.1371/journal.pone.0196391
[17]. Lubis, N., Gomez, R., Sakti, S., Nakamura, K.,
Yoshino, K., Nakamura, S., & Nakadai, K. (2016, May).
Construction of Japanese audio-visual emotion
database and its application in emotion recognition. In
Proceedings of the Tenth International Conference on
Language Resources and Evaluation (LREC'16) (pp. 2180-2184).
[18]. Martin, O., Kotsia, I., Macq, B., & Pitas, I. (2006, April). The eNTERFACE'05 audio-visual emotion database. In 22nd
International Conference on Data Engineering
Workshops (ICDEW'06) (pp. 8-8). IEEE. https://doi.org/10.1109/ICDEW.2006.145
[19]. Meftah, A. H., Qamhan, M. A., Seddiq, Y., Alotaibi, Y.
A., & Selouani, S. A. (2021). King saud university emotions
corpus: Construction, analysis, evaluation, and
comparison. IEEE Access, 9, 54201-54219. https://doi.org/10.1109/ACCESS.2021.3070751
[20]. Parada-Cabaleiro, E., Costantini, G., Batliner, A.,
Baird, A., & Schuller, B. (2018). Categorical vs
dimensional perception of Italian emotional speech.
Interspeech, 3638-3642. https://doi.org/10.5281/zenodo.1326428.
[21]. Parada-Cabaleiro, E., Costantini, G., Batliner, A.,
Schmitt, M., & Schuller, B. W. (2020). DEMoS: An Italian
emotional speech corpus. Language Resources and Evaluation, 54(2), 341-383. https://doi.org/10.1007/s10579-019-09450-y
[22]. Pichora-Fuller, M. K., & Dupuis, K. (2020). Toronto
emotional speech set (TESS). Scholars Portal Dataverse, 1,
2020. https://doi.org/10.5683/SP2/E8H2MF
[23]. Poria, S., Hazarika, D., Majumder, N., Naik, G.,
Cambria, E., & Mihalcea, R. (2018). Meld: A multimodal
multi-party dataset for emotion recognition in
conversations. arXiv preprint arXiv:1810.02508. https://doi.org/10.48550/arXiv.1810.02508
[24]. Rambabu, B., Botsa, K. K., Paidi, G., & Gangashetty,
S. V. (2020, May). IIIT-H TEMD semi-natural emotional speech database from professional actors and non
actors. In Proceedings of the 12th Language Resources
and Evaluation Conference (pp. 1538-1545).
[25]. Ververidis, D., & Kotropoulos, C. (2006). Emotional
speech recognition: Resources, features, and methods.
Speech Communication, 48(9), 1162-1181. https://doi.org/10.1016/j.specom.2006. 04.003
[26]. Williams, C. E., & Stevens, K. N. (1972). Emotions and
speech: Some acoustical correlates. The Journal of the
Acoustical Society of America, 52(4B), 1238-1250. https://doi.org/10.1121/1.1913238