Speech feature extraction and Emotion Recognition using Deep Learning Techniques

Pagidirayi Anil Kumar*
Periodicity:July - December'2024

Abstract

Speech Emotion Recognition (SER) is crucial for human-computer interaction, enabling systems to better understand emotions. Traditional feature extraction methods like Gamma Tone Cepstral Coefficients (GTCC) have been used in SER for their ability to capture auditory features aligned with human hearing, but they often fall short in capturing emotional nuances. Mel Frequency Cepstral Coefficients (MFCC) have gained prominence for better representing speech signals in emotion recognition. This work introduces an approach combining traditional and modern techniques, comparing GTCC-based extraction with MFCC and utilizing the Ensemble Subspace k-Nearest Neighbors (ES-kNN) classifier to improve accuracy. Additionally, deep learning models like Long Short-Term Memory (LSTM) and Bidirectional LSTM (Bi-LSTM) are explored for their ability to capture temporal dependencies in speech. Datasets such as CREMA-D and SAVEE  are used.

Keywords

SER, GTCC, MFCC, LSTM, Bi-LSTM, ES-kNN

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.