This paper explores the utilization of three major datasets, SAVEE, Toronto Emotion Speech Set (TESS), and CREMA-D, which together contain a substantial repository of 75,000 samples. These datasets cover a broad spectrum of human emotions, from anger, sadness, fear, and disgust to calm, happiness, neutral states, and surprise, mapped to numerical labels from 1 to 8, respectively. The primary objective is to develop a real-time deep learning system specifically tailored for emotion recognition using speech inputs from a PC microphone. This system aims to create a robust model capable of not only capturing live speech but also analyzing audio files in detail, allowing for the classification of specific emotional states. To achieve this, the Long Short-Term Memory (LSTM) network architecture, a specialized form of Recurrent Neural Network (RNN), was chosen for its proven accuracy in speech-centered emotion recognition tasks. The model was rigorously trained using the RAVDESS dataset, comprising 7,356 distinct audio files, with 5,880 files carefully selected for training to enhance accuracy and improve the model's effectiveness in detecting emotions across diverse speech samples. The resulting model achieved a training dataset accuracy of 83%, marking a substantial milestone in advancing speech-based emotion recognition systems.