i-manager Publications

Literature Survey on Development of a Model for Detecting Emotions using CNN and LSTM

Manish Goswami*, Aditya Parate**, Nisarga Kapde***, Shashwat Singh****, Nitiksha Gupta*****, Meena Surjuse******

*-****** Department of Computer Science and Engineering, S. B. Jain Institute of Technology, Management and Research, Nagpur, India.

Periodicity:July - September'2024
DOI : https://doi.org/10.26634/jse.19.1.21231

Abstract

This paper explores the utilization of three major datasets, SAVEE, Toronto Emotion Speech Set (TESS), and CREMA-D, which together contain a substantial repository of 75,000 samples. These datasets cover a broad spectrum of human emotions, from anger, sadness, fear, and disgust to calm, happiness, neutral states, and surprise, mapped to numerical labels from 1 to 8, respectively. The primary objective is to develop a real-time deep learning system specifically tailored for emotion recognition using speech inputs from a PC microphone. This system aims to create a robust model capable of not only capturing live speech but also analyzing audio files in detail, allowing for the classification of specific emotional states. To achieve this, the Long Short-Term Memory (LSTM) network architecture, a specialized form of Recurrent Neural Network (RNN), was chosen for its proven accuracy in speech-centered emotion recognition tasks. The model was rigorously trained using the RAVDESS dataset, comprising 7,356 distinct audio files, with 5,880 files carefully selected for training to enhance accuracy and improve the model's effectiveness in detecting emotions across diverse speech samples. The resulting model achieved a training dataset accuracy of 83%, marking a substantial milestone in advancing speech-based emotion recognition systems.

Keywords

Long Short Term Memory (LSTM), Convolutional Neural Network (CNN) Recurrent Neural Network, RAVDEES, CREMA-D, TESS, SAVEE Dataset.

How to Cite this Article?

Goswami, M., Parate, A., Kapde, N., Singh, S., Gupta, N., and Surjuse, M. (2024). Literature Survey on Development of a Model for Detecting Emotions using CNN and LSTM. i-manager’s Journal on Software Engineering, 19(1), 35-43. https://doi.org/10.26634/jse.19.1.21231

References

[1]. Al Osman, H., & Falk, T. H. (2017). Multimodal affect recognition: Current approaches and challenges. Emotion and Attention Recognition Based on Biological Signals and Images (pp. 59-86).

[2]. Aouani, H., & Ayed, Y. B. (2020). Speech emotion recognition with deep learning. Procedia Computer Science, 176, 251-260.

[3]. Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The Automatic Recognition of Emotions in Speech (pp. 71-99). Springer Berlin Heidelberg.

[4]. Canedo, D., & Neves, A. J. (2019). Facial expression recognition using computer vision: A systematic review. Applied Sciences, 9(21), 4678.

[5]. Dhavale, M., & Bhandari, S. (2022, August). Speech emotion recognition using CNN and LSTM. In 2022 6th International Conference On Computing, Communication, Control And Automation (ICCUBEA) (pp. 1-3). IEEE.

[6]. Han, K., Yu, D., & Tashev, I. (2014, September). Speech emotion recognition using deep neural network and extreme learning machine. In Interspeech 2014 (pp. 223-227).

[7]. Khalil, R. A., Jones, E., Babar, M. I., Jan, T., Zafar, M. H., & Alhussain, T. (2019). Speech emotion recognition using deep learning techniques: A review. IEEE Access, 7, 117327-117345.

[8]. Liu, G., Cai, S., & Wang, C. (2023). Speech emotion recognition based on emotion perception. EURASIP Journal on Audio, Speech, and Music Processing, 2023(1), 22.

[9]. Mountzouris, K., Perikos, I., & Hatzilygeroudis, I. (2023). Speech emotion recognition using convolutional neural networks with attention mechanism. Electronics, 12(20), 4376.

[10]. Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: The state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1424-1445.

[11]. Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. International Journal of Advanced Computer Science and Applications, 7(8), 84-90.

[12]. Sarmah, K., Gogoi, S., Das, H. C., Patir, B., & Sarma, M. J. (2024). A state-of-arts review of deep learning techniques for speech emotion recognition. Journal of Electrical Systems, 20(7s), 1638-1652.

[13]. Shinde, A. S., & Patil, V. V. (2021, May). Speech emotion recognition system: A review. In Proceedings of the 4th International Conference on Advances in Science & Technology (ICAST2021).

[14]. Tripathi, S., Kumar, A., Ramesh, A., Singh, C., & Yenigalla, P. (2019). Deep learning based emotion recognition system using speech features and transcriptions. arXiv preprint arXiv:1906.05681.

[15]. Ullah, R., Asif, M., Shah, W. A., Anjam, F., Ullah, I., Khurshaid, T., & Alibakhshikenari, M. (2023). Speech emotion recognition using convolution neural networks and multi-head convolutional transformer. Sensors, 23(13), 6212.

[16]. Zhao, S., Yang, Y., Cohen, I., & Zhang, L. (2021, August). Speech emotion recognition using auditory spectrogram and cepstral features. In 2021 29th European Signal Processing Conference (EUSIPCO) (pp. 136-140). IEEE.

[17]. Zisad, S. N., Hossain, M. S., & Andersson, K. (2020, September). Speech emotion recognition in neurological disorders using convolutional neural network. In International Conference on Brain Informatics (pp. 287-296). Cham: Springer International Publishing.

Literature Survey on Development of a Model for Detecting Emotions using CNN and LSTM

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content:

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25