One of the unsolved problems in computer vision is recognizing or understanding other people's emotions and feelings. Deep Convolutional Neural Networks (CNNs) have attempted to be effective in addressing emotion recognition issues. The significant level of performance achieved by these classifiers can be attributed to their ability to self-learn a downsampled feature vector that retains abstraction information through filter kernels in convolutional layers. Despite these advancements, challenges persist in capturing subtle and context-dependent emotional nuances, hindering the development of more nuanced emotion recognition systems. Ongoing research explores multimodal approaches, integrating various sensory inputs, to enhance the accuracy and reliability of emotion detection in real-world scenarios. In this paper, we explore the impact of training the initial weights in an unsupervised manner. We study the results of pretraining a Deep CNN as a Convolutional Auto-Multiplexer (CAM) in a greedy layer-wise unsupervised fashion for emotion recognition using facial feature images. When trained with randomly initialized weights, our CNN emotion recognition model achieves a performance rate of 92.16% on the Karolinska Directed Emotional Faces (KDEF) dataset. In contrast, by using this pre-trained model, the performance increases to 93.52%. Pre-training our CNN as a CAM also marginally reduces training time.