Literature Survey on Development of a Model for Detecting Emotions using CNN and LSTM

Manish Goswami*, Aditya Parate**, Nisarga Kapde***, Shashwat Singh****, Nitiksha Gupta*****, Meena Surjuse******
*-****** Department of Computer Science and Engineering, S. B. Jain Institute of Technology, Management and Research, Nagpur, India.
Periodicity:July - September'2024

Abstract

This paper explores the utilization of three major datasets, SAVEE, Toronto Emotion Speech Set (TESS), and CREMA-D, which together contain a substantial repository of 75,000 samples. These datasets cover a broad spectrum of human emotions, from anger, sadness, fear, and disgust to calm, happiness, neutral states, and surprise, mapped to numerical labels from 1 to 8, respectively. The primary objective is to develop a real-time deep learning system specifically tailored for emotion recognition using speech inputs from a PC microphone. This system aims to create a robust model capable of not only capturing live speech but also analyzing audio files in detail, allowing for the classification of specific emotional states. To achieve this, the Long Short-Term Memory (LSTM) network architecture, a specialized form of Recurrent Neural Network (RNN), was chosen for its proven accuracy in speech-centered emotion recognition tasks. The model was rigorously trained using the RAVDESS dataset, comprising 7,356 distinct audio files, with 5,880 files carefully selected for training to enhance accuracy and improve the model's effectiveness in detecting emotions across diverse speech samples. The resulting model achieved a training dataset accuracy of 83%, marking a substantial milestone in advancing speech-based emotion recognition systems.

Keywords

Long Short Term Memory (LSTM), Convolutional Neural Network (CNN) Recurrent Neural Network, RAVDEES, CREMA-D, TESS, SAVEE Dataset.

How to Cite this Article?

Goswami, M., Parate, A., Kapde, N., Singh, S., Gupta, N., and Surjuse, M. (2024). Literature Survey on Development of a Model for Detecting Emotions using CNN and LSTM. i-manager’s Journal on Software Engineering, 19(1), 35-43.

References

[11]. Pervaiz, M., & Khan, T. A. (2016). Emotion recognition from speech using prosodic and linguistic features. International Journal of Advanced Computer Science and Applications, 7(8), 84-90.
[12]. Sarmah, K., Gogoi, S., Das, H. C., Patir, B., & Sarma, M. J. (2024). A state-of-arts review of deep learning techniques for speech emotion recognition. Journal of Electrical Systems, 20(7s), 1638-1652.
[13]. Shinde, A. S., & Patil, V. V. (2021, May). Speech emotion recognition system: A review. In Proceedings of the 4th International Conference on Advances in Science & Technology (ICAST2021).
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.