In this paper, our focus revolved around the utilization of three significant datasets: SAVEE, Toronto Emotion Speech Set (TESS), and CREMA-D, together encompassing a vast repository of 75,000 samples. These datasets encapsulate a wide spectrum of human emotions, ranging from Anger, Sadness, Fear, and Disgust to Calm, Happiness, Neutral states, and Surprises, which are mapped to numerical labels from 1 to 8, respectively. Our project's central objective was the development of a realtime deep learning system specifically tailored for emotion recognition using speech inputs sourced from a PC microphone. The primary aim was to engineer a robust model capable of not only capturing live speech but also intricately analyzing audio files, thereby enabling the system to discern and classify specific emotional states.To achieve this goal, we opted for the Long Short-Term Memory (LSTM) network architecture, a specialized form of artificial Recurrent Neural Network (RNN). The decision to employ LSTM was driven by its proven track record in delivering heightened accuracy when tasked with speech-centric emotion recognition endeavors. Our model underwent rigorous training using the RAVDEES dataset, a rich repository housing 7,356 distinct audio files. Leveraging this dataset, we strategically selected 5,880 files for training purposes, a meticulous approach aimed at bolstering accuracy and ensuring the model's efficacy in detecting and recognizing emotions across a diverse array of speech samples. The culmination of our efforts resulted in a commendable training dataset accuracy of 83%, marking a significant milestone in the advancement of speech-based emotion recognition systems.