This paper presents the development of a real-time deep learning system for emotion recognition using both speech and facial inputs. For speech emotion recognition, three significant datasets: SAVEE, Toronto Emotion Speech Set (TESS), and CREMA-D were utilized, comprising over 75,000 samples that represent a spectrum of emotions: Anger, Sadness, Fear, Disgust, Calm, Happiness, Neutral, and Surprise, mapped to numerical labels from 1 to 8. The system identifies emotions from live speech inputs and pre-recorded audio files using a Long Short-Term Memory (LSTM) network, which is particularly effective for sequential data. The LSTM model, trained on the RAVDEES dataset (7,356 audio files), achieved a training accuracy of 83%. For facial emotion recognition, a Convolutional Neural Network (CNN) architecture was employed, using datasets such as FER2013, CK+, AffectNet, and JAFFE. FER2013 includes over 35,000 labeled images representing seven key emotions, while CK+ provides 593 video sequences for precise emotion classification. By integrating LSTM for speech and CNN for facial emotion recognition, the system shows robust capabilities in identifying and classifying emotions across modalities, enabling comprehensive real-time emotion recognition.