i-manager Publications

Automatic Lip-Reading Model using 3D-CNN & LSTM

Kaki Leela Prasad*, P. Yeshwanth Sai**, P. Sailaja Devi***, P. Chandan Lohit****, V. Sai Tarun*****

*-***** Department of Computer Science and Engineering, Maharaj Vijayaram Gajapathi Raj College of Engineering, Vizianagaram, India.

Periodicity:January - March'2024
DOI : https://doi.org/10.26634/jse.18.3.20576

Abstract

Automatic lip-reading, the process of decoding spoken language through visual analysis of lip movements, presents a promising avenue for advancing human-computer interaction and accessibility. This research proposes an innovative model integrating 3D Convolutional Neural Networks (3D-CNN) and Long Short-Term Memory (LSTM) networks to enhance the accuracy and efficiency of lip-reading systems. The model addresses challenges related to lighting variations, speaker articulation, and linguistic diversity. This contrasts with traditional 2D-CNN, which focuses solely on spatial information, often missing temporal intricacies vital for accurate lip-reading. By incorporating 3D-CNN alongside LSTM, the proposed model significantly enhances recognition accuracy, offering a more comprehensive understanding of speech nuances. Extensive training on a diverse dataset and the exploration of transfer learning techniques contribute to the robustness and generalization of the model.

Keywords

3D Convolutional Neural Networks (3D-CNN), Long Short-Term Memory (LSTM), Human-Computer Interaction, OpenCV, Recurrent Neural Networks (RNN), Linguistic.

How to Cite this Article?

Prasad, K. L., Sai, P. Y., Devi, P. S., Lohit, P. C., and Tarun, V. S. (2024). Automatic Lip-Reading Model using 3D-CNN & LSTM. i-manager’s Journal on Software Engineering, 18(3), 32-42. https://doi.org/10.26634/jse.18.3.20576

References

[1]. Akman, N. P., Sivri, T. T., Berkol, A., & Erdem, H. (2022, July). Lip reading multiclass classification by using dilated CNN with Turkish dataset. In 2022 International Conference on Electrical, Computer and Energy Technologies (ICECET) (pp. 1-6). IEEE.

[2]. Assael, Y. M., Shillingford, B., Whiteson, S., & De Freitas, N. (2016). Lipnet: End-to-end sentence-level lipreading. arXiv preprint arXiv:1611.01599.

[3]. Bear, H. L., Owen, G., Harvey, R., & Theobald, B. J. (2014, October). Some observations on computer lip- reading: Moving from the dream to the reality. In Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X; and Optical Materials and Biomaterials in Security and Defence Systems Technology XI, 9253, 121-130. SPIE.

[4]. Chen, X., Du, J., & Zhang, H. (2020). Lipreading with DenseNet and resBi-LSTM. Signal, Image and Video Processing, 14, 981-989.

[5]. Duchnowski, P., Meier, U., & Waibel, A. (1994, September). See me, hear me: Integrating automatic speech recognition and lip-reading. In International Conference on Spoken Language Processing (ICSLP), 94, 547-550.

[6]. Fenghour, S., Chen, D., & Xiao, P. (2019, April). Decoder-encoder LSTM for lip reading. In Proceedings of the 8th International Conference on Software and Information Engineering (pp. 162-166).

[7]. Fenghour, S., Chen, D., Guo, K., Li, B., & Xiao, P. (2021). Deep learning-based automated lip-reading: A survey. IEEE Access, 9, 121184-121205.

[8]. Fernandez-Lopez, A., & Sukno, F. M. (2018). Survey on automatic lip-reading in the era of deep learning. Image and Vision Computing, 78, 53-72.

[9]. Garg, A., Noyola, J., & Bagadia, S. (2016). Lip Reading Using CNN and LSTM. Technical report, Stanford University, CS231 n project report.

[10]. Li, Y., Takashima, Y., Takiguchi, T., & Ariki, Y. (2016, June). Lip reading using a dynamic feature of lip images and convolutional neural networks. In 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) (pp. 1-6). IEEE.

[11]. Lu, Y., & Li, H. (2019). Automatic lip-reading system based on deep convolutional neural network and attention-based long short-term memory. Applied Sciences, 9(8), 1599.

[12]. Marbaniang, S. P., Patel, R., Kumar, P., Chauhan, S., & Srivastava, S. (2022). Hearing and vision difficulty and sequential treatment among older adults in India. Scientific Reports, 12(1), 19056.

[13]. Margam, D. K., Aralikatti, R., Sharma, T., Thanda, A., Roy, S., & Venkatesan, S. M. (2019). LipReading with 3D- 2D-CNN BLSTM-HMM and word-CTC models. arXiv preprintarXiv:1906.12170.

[14]. NadeemHashmi, S., Gupta, H., Mittal, D., Kumar, K., Nanda, A., & Gupta, S. (2018, August). A lip reading model using CNN with batch normalization. In 2018 Eleventh International Conference on Contemporary Computing (IC3) (pp. 1-6). IEEE.

[15]. Prajwal, K. R., Afouras, T., & Zisserman, A. (2022). Sub-word level lip reading with visual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5162-5172).

[16]. Tang, Y. F., & Zhang, Y. S. (2009, August). Design and implementation of college student information management system based on web Services. In 2009 IEEE International Symposium on IT in Medicine & Education, 1, 1044-1048. IEEE.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	35	35	200	15
Pdf & Online	35	35	400	25

Automatic Lip-Reading Model using 3D-CNN & LSTM

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: