i-manager Publications

Accurate Screen Detection in Presentation Videos using Deep Learning

Purushotham E.*, Kasarapu Ramani**, C. Shobha Bindu***

* Department of Computer Science and Engineering, Jawaharlal Nehru Technological University, Kakinada (JNTUK), Kakinada, Andhra Pradesh, India.

** Department of Information Technology, Sree Vidyanikethan Engineering College, Tirupati, Andhra Pradesh, India.

*** Department of Computer Science and Engineering, Jawaharlal Nehru Technological University College of Engineering, Ananthapuramu, Andhra Pradesh, India.

Periodicity:April - June'2025
DOI : https://doi.org/10.26634/jfet.20.3.21795

Abstract

Lecture videos are widely used in classroom and conference environments, where digital slides are frequently displayed on a screen, making screen detection essential for extracting slide areas from presentation videos. This study presents a method for identifying the position of slide areas in video frames by utilizing the You Only Look Once (YOLO) object detection framework. A tailored YOLOv7 model is trained using a labeled dataset that includes frames from presentation videos featuring projected slides. The trained model is subsequently evaluated on unfamiliar images to correctly identify projector screens. The dataset includes more than 2,000 labeled frames, which are increased to 5,000 images by using data augmentation methods. The suggested approach is assessed in comparison to other renowned object detection models. Experimental findings show that the customized YOLOv7 model attains superior accuracy and computational efficiency relative to the standard YOLOv7 and RetinaNet. The results indicate that this method provides a dependable solution for detecting projector screens and can be utilized in different real-world situations.

Keywords

Deep Learning; Screen Detection; Video Presentation, Data Augmentation.

How to Cite this Article?

Purushotham, E., Ramani, K., and Bindu, C. S. (2025). Accurate Screen Detection in Presentation Videos using Deep Learning. i-manager’s Journal on Future Engineering & Technology, 20(3), 1-10. https://doi.org/10.26634/jfet.20.3.21795

References

[1]. Abhilash, R. K., Anurag, C., Avinash, V., & Uma, D. (2021). Lecture video summarization using subtitles. In 2nd EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing: BDCC 2019 (pp. 83-92). Springer International Publishing.

[2]. Adcock, J., Cooper, M., Denoue, L., Pirsiavash, H., & Rowe, L. A. (2010). Talkminer: A lecture webcast search engine. In Proceedings of the 18th ACM International Conference on Multimedia (pp. 241-250).

[3]. Adrakatti, A. F., & Mulla, K. R. (2022). Content based retrieval of lecture video repository: Literature review. Library Philosophy and Practice (pp. 1-27).

[4]. Araujo, A., Chaves, J., Lakshman, H., Angst, R., & Girod, B. (2016). Large-scale query-by-image video retr i eval using bloom f i l ters. arX i v preprint arXiv:1604.07939.

[5]. Azhar, Z., Chaudhry, H. N., Kulsoom, F., & Narejo, S. (2024). Deep learning-based automated classroom slide extraction. International Journal of Innovations in Science & Technology, 6(2), 380-395.

[6]. Chen, H., & Guan, J. (2022). Teacher–student behavior recognition in classroom teaching based on improved YOLO-v4 and Internet of Things technology. Electronics, 11(23), 3998.

[7]. Davila, K., Xu, F., Setlur, S., & Govindaraju, V. (2021). Fcn-lecturenet: Extractive summarization of whiteboard and chalkboard lecture videos. IEEE Access, 9, 104469- 104484.

[8]. Dutta, K., Mathew, M., Krishnan, P., & Jawahar, C. V. (2018). Localizing and recognizing text in lecture videos. In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) (pp. 235-240). IEEE.

[9]. Dwyer, B., Nelson, J., & Solawetz, J. E. (2022). Roboflow and Research.

[10]. Furini, M., Mirri, S., & Montangero, M. (2018). Topic-based playlist to improve video lecture accessibility. In 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC) (pp. 1-5). IEEE.

[11]. Hassani, H., Ershadi, M. J., & Mohebi, A. (2022). LVTIA: A new method for keyphrase extraction from scientific video lectures. Information Processing & Management, 59(2), 102802.

[12]. Husain, M., & Meena, S. M. (2019). Multimodal fusion of speech and text using semi-supervised LDA for indexing lecture videos. In 2019 National Conference on Communications (NCC) (pp. 1-6). IEEE.

[13]. Kanadje, M., Miller, Z., Agarwal, A., Gaborski, R., Zanibbi, R., & Ludi, S. (2016). Assisted keyword indexing for lecture videos using unsupervised keyword spotting. Pattern Recognition Letters, 71, 8-15.

[14]. Kate, L. S., Waghmare, M. M., & Priyadarshi, A. (2015). An approach for automated video indexing and video search in large lecture video archives. In 2015 International Conference on Pervasive Computing (ICPC) (pp. 1-5). IEEE.

[15]. Kota, B. U., Stone, A., Davila, K., Setlur, S., & Govindaraju, V. (2021). Automated whiteboard lecture video summarization by content region detection and representation. In 2020 25th International Conference on Pattern Recognition (ICPR) (pp. 10704-10711). IEEE.

[16]. Lee, G. C., Yeh, F. H., Chen, Y. J., & Chang, T. K. (2017). Robust handwriting extraction and lecture video summarization. Multimedia Tools and Applications, 76, 7067-7085.

[17]. Li, K., Wang, J., Wang, H., & Dai, Q. (2014). Structuring lecture videos by automatic projection screen localization and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1233-1246.

[18]. Lin, T. Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2980-2988).

[19]. Loc, C. V., Nhan, N. T., Viet, T. X., Viet, T. H., Thao, L. H., & Viet, N. H. (2021). Content based lecture video retrieval using textual queries: To be Smart University. In 2021 13th International Conference on Knowledge and Systems Engineering (KSE) (pp. 1-6). IEEE.

[20]. Medida, L. H., & Ramani, K. (2021). Impact of deep learning on localizing and recognizing handwritten text in lecture videos. International Journal of Advanced Computer Science and Applications, 12(4), 336-344.

[21]. Monserrat, T. J. K. P., Zhao, S., McGee, K., & Pandey, A. V. (2013). Note video: Facilitating navigation of blackboard-style lecture videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 1139-1148).

[22]. Rahman, M. R., Shah, S., & Subhlok, J. (2020). Visual summarization of lecture video segments for enhanced navigation. In 2020 IEEE International Symposium on Multimedia (ISM) (pp. 154-157). IEEE.

[23]. Rajgure, S., Oria, V., & Gouton, P. (2014). Slide localization in video sequence by using a rapid and suitable segmentation in marginal space. In Color Imaging XIX: Displaying, Processing, Hardcopy, and Applications, 9015, 129-141. SPIE.

[24]. Ravi, S., Chauhan, S., Yadlapallii, S. H., Jagruth, K., & Manikandan, V. M. (2021). A novel educational video retrieval system based on the textual information. In International Conference on Soft Computing and Pattern Recognition (pp. 502-511). Springer International Publishing.

[25]. Shin, H. V., Berthouzoz, F., Li, W., & Durand, F. (2015). Visual transcripts: Lecture notes from blackboard-style lecture videos. ACM Transactions on Graphics (TOG), 34(6), 1-10.

[26]. Soundes, B., Larbi, G., & Samir, Z. (2019). Pseudo Zernike moments-based approach for text detection and localisation from lecture videos. International Journal of Computational Science and Engineering, 19(2), 274-283.

[27]. Sun, F., & Tian, X. (2022). Lecture video automatic summarization system based on DBNet and Kalman filtering. Mathematical Problems in Engineering, 2022(1), 5303503.

[28]. Thomas, C., Sarma, K. P., Gajula, S. S., & Jayagopi, D. B. (2022). Automatic prediction of presentation style and student engagement from videos. Computers and Education: Artificial Intelligence, 3, 100079.

[29]. Tuna, T. (2015). Automated Lecture Video Indexing with Text Analysis and Machine Learning (Doctoral dissertation, University of Houston).

[30]. Wang, C. Y., Bochkovskiy, A., & Liao, H. Y. M. (2023). YOLOv7: Trainable bag-of-freebies sets new state-of-the- art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7464-7475).

[31]. Wangchen, T., Tharindi, P. N., De Silva, K. C., Sandeepa, W. T., Kodagoda, N., & Suriyawansa, K. (2022). EDUZONE–An educational video summarizer and digital human assistant for effective learning. In 2022 7th International Conference on Information Technology Research (ICITR) (pp. 1-6). IEEE.

[32]. Xu, C., Jia, W., Wang, R., He, X., Zhao, B., & Zhang, Y. (2022). Semantic navigation of powerpoint-based lecture video for autonote generation. IEEE Transactions on Learning Technologies, 16(1), 1-17.

[33]. Xu, C., Wang, R., Lin, S., Luo, X., Zhao, B., Shao, L., & Hu, M. (2019). Lecture2note: Automatic generation of lecture notes from slide-based educational videos. In 2019 IEEE International Conference on Multimedia and Expo (ICME) (pp. 898-903). IEEE.

[34]. Yang, H., & Meinel, C. (2014). Content based lecture video retrieval using speech and video text information. IEEE Transactions on Learning Technologies, 7(2), 142- 154.

[35]. Yang, H., Gruenewald, F., & Meinel, C. (2012). Automated extraction of lecture outlines from lecture videos. In 4th International Conference on Computer Supported Education, 2, 13-22.

[36]. Yang, H., Siebert, M., Luhne, P., Sack, H., & Meinel, C. (2011). Lecture video indexing and analysis using video OCR technology. In 2011 Seventh International Conference on Signal Image Technology & Internet- Based Systems (pp. 54-61). IEEE.

[37]. Yang, S., Xiao, W., Zhang, M., Guo, S., Zhao, J., & Shen, F. (2022). Image data augmentation for deep learning: A survey. arXiv preprint arXiv:2204.08610.

[38]. Zhao, B., Lin, S., Qi, X., Wang, R., & Luo, X. (2018). A novel approach to automatic detection of presentation slides in educational videos. Neural Computing and Applications, 29, 1369-1382.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Accurate Screen Detection in Presentation Videos using Deep Learning

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: