The use of Closed-Circuit Television (CCTV) footage for surveillance, demographic monitoring, and behavior analysis has increased dramatically in the recent years. One of the key challenges in analyzing such recordings is to count people and identify emotions efficiently and accurately. This paper proposed a system for counting people and identifying facial expressions of emotion in CCTV footage. It used Deep Neural Networks (DNN) trained on massive annotation datasets to recognize people in videos, extract facial images, and identify emotions. It evaluated the performance of the method on publicly available datasets and compares it with state-of-the-art techniques. The findings demonstrate the ability of the method to accurately count people and identify their emotions.