i-manager Publications

Speech Feature Extraction and Emotion Recognition using Deep Learning Techniques

Pagidirayi Anil Kumar*, Anuradha B.**

*-** Department of Electronics and Communication Engineering, Sri Venkateswara University College of Engineering, Tirupati, Andhra Pradesh, India.

Periodicity:July - December'2024
DOI : https://doi.org/10.26634/jdp.12.2.21179

Abstract

Speech Emotion Recognition (SER) is crucial for human-computer interaction, enabling systems to better understand emotions. Traditional feature extraction methods like Gamma Tone Cepstral Coefficients (GTCC) are used in SER for their ability to capture auditory features aligned with human hearing, but these methods fail to capture emotional nuances effectively. Mel Frequency Cepstral Coefficients (MFCC) have gained prominence for better representing speech signals in emotion recognition. This work introduces an approach combining traditional and modern techniques, comparing GTCC-based extraction with MFCC and utilizing the Ensemble Subspace k-Nearest Neighbors (ES-kNN) classifier to improve accuracy. Additionally, deep learning models like Long Short-Term Memory (LSTM) and Bidirectional LSTM (Bi-LSTM) are explored for their ability to capture temporal dependencies in speech. Datasets such as CREMA-D and SAVEE are used

Keywords

Speech Emotion Recognition (SER), Deep Learning, Bidirectional LSTM (Bi-LSTM), Temporal Dependencies, SAVEE Dataset, Feature Extraction.

How to Cite this Article?

Kumar, P. A., and Anuradha, B. (2024). Speech Feature Extraction and Emotion Recognition using Deep Learning Techniques. i-manager’s Journal on Digital Signal Processing, 12(2), 1-12. https://doi.org/10.26634/jdp.12.2.21179

References

[1]. Aouani, H., & Ayed, Y. B. (2020). Speech emotion recognition with deep learning. Procedia Computer Science, 176, 251-260.

[2]. Aziz, R., Verma, C. K., & Srivastava, N. (2017). Dimension reduction methods for microarray data: A review. AIMS Bioengineering, 4(2), 179-197.

[3]. Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614-636.

[4]. Chen, F., & Jokinen, K. (2010). Speech Technology. Springer.

[5]. Connor, J. D. O., & Arnold, G. F. (1973). Intonation of Colloquial English. Longman, London.

[6]. Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., & Taylor, J. G. (2001). Emotion recognition in human-computer interaction. IEEE Signal Processing Magazine, 18(1), 32-80.

[7]. de Velasco, M., Justo, R., Antón, J., Carrilero, M., & Torres, M. I. (2018, November). Emotion detection from speech and text. In IberSPEECH (pp. 68-71).

[8]. Deller Jr, J. R., Proakis, J. G., & Hansen, J. H. (1993). Discrete Time Processing of Speech Signals. Prentice Hall PTR.

[9]. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572-587.

[10]. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572-587.

[11]. Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 6645-6649). IEEE.

[12]. Issa, D., Demirci, M. F., & Yazici, A. (2020). Speech emotion recognition with deep convolutional neural networks. Biomedical Signal Processing and Control, 59, 101894.

[13]. Jurafsky, D. (2000). Speech and Language Processing. Pearson Education.

[14]. Kaggle. (n.d.). Level up with the Largest AI & ML Community.

[15]. Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., & Mahjoub, M. A. (2018). Speech emotion recognition: Methods and cases study. In Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 1, 175-182.

[16]. Kołakowska, A., Landowska, A., Szwoch, M., Szwoch, W., & Wrobel, M. R. (2014). Emotion recognition and its applications. Human-Computer Systems Interaction: Backgrounds and Applications 3, 51-62.

[17]. Kwon, O. W., Chan, K., Hao, J., & Lee, T. W. (2003, September). Emotion recognition by speech signals. In Interspeech (pp. 125-128).

[18]. Langari, S., Marvi, H., & Zahedi, M. (2020). Efficient speech emotion recognition using modified feature extraction. Informatics in Medicine Unlocked, 20, 100424.

[19]. Malhotra, P., Vig, L., Shroff, G., & Agarwal, P. (2015, April). Long short term memory networks for anomaly detection in time series. In the European Symposium on Artificial Neural Networks.

[20]. Mitchell, T. M. (1997). Machine Learning. McGraw-hill, New York.

[21]. Nicholson, J., Takahashi, K., & Nakatsu, R. (2000). Emotion recognition in speech using neural networks. Neural Computing & Applications, 9, 290-296.

[22]. Pagidirayi, A. K., & Anuradha, B. (2023). An efficient speech emotion recognition using LSTM model. NeuroQuantology, 21(1), 117-127.

[23]. Pagidirayi, A. K., & Bhuma, A. (2022). Speech emotion recognition using machine learning techniques. Revue d'Intelligence Artificielle, 36(2), 271–278.

[24]. Peinado, A., & Segura, J. (2006). Speech Recognition over Digital Channels: Robustness and Standards. John Wiley & Sons.

[25]. Prasanth, S., Thanka, M. R., Edwin, E. B., & Nagaraj, V. (2021). Withdrawn: Speech emotion recognition based on machine learning tactics and algorithms. Materials Today: Proceedings.

[26]. Ramakrishnan, S. (2012). Recognition of emotion from speech: A review. Speech Enhancement, Modeling and Recognition-Algorithms and Applications, 7, 121-137.

[27]. Shafee, S., & Anuradha, B. (2016). Speaker identification and spoken word recognition in noisy environment using different techniques. International Journal on Recent and Innovation Trends in Computing and Communication, 4(6), 590-595.

[28]. Tzinis, E., & Potamianos, A. (2017, October). Segment-based speech emotion recognition using recurrent neural networks. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 190-195). IEEE.

[29]. Yang, Z., & Huang, Y. (2022). Algorithm for speech emotion recognition classification based on mel-frequency cepstral coefficients and broad learning system. Evolutionary Intelligence, 15(4), 2485-2494.

[30]. Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312-323.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Speech Feature Extraction and Emotion Recognition using Deep Learning Techniques

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: