Automatic Lip-Reading Model using 3D-CNN & LSTM

Kaki Leela Prasad*, P. Yeshwanth Sai**, P. Sailaja Devi***, P. Chandan Lohit****, V. Sai Tarun*****
*-***** Department of Computer Science and Engineering, Maharaj Vijayaram Gajapathi Raj College of Engineering, Vizianagaram, India.
Periodicity:January - March'2024
DOI : https://doi.org/10.26634/jse.18.3.20576

Abstract

Automatic lip-reading, the process of decoding spoken language through visual analysis of lip movements, presents a promising avenue for advancing human-computer interaction and accessibility. This research proposes an innovative model integrating 3D Convolutional Neural Networks (3D-CNN) and Long Short-Term Memory (LSTM) networks to enhance the accuracy and efficiency of lip-reading systems. The model addresses challenges related to lighting variations, speaker articulation, and linguistic diversity. This contrasts with traditional 2D-CNN, which focuses solely on spatial information, often missing temporal intricacies vital for accurate lip-reading. By incorporating 3D-CNN alongside LSTM, the proposed model significantly enhances recognition accuracy, offering a more comprehensive understanding of speech nuances. Extensive training on a diverse dataset and the exploration of transfer learning techniques contribute to the robustness and generalization of the model.

Keywords

3D Convolutional Neural Networks (3D-CNN), Long Short-Term Memory (LSTM), Human-Computer Interaction, OpenCV, Recurrent Neural Networks (RNN), Linguistic.

How to Cite this Article?

Prasad, K. L., Sai, P. Y., Devi, P. S., Lohit, P. C., and Tarun, V. S. (2024). Automatic Lip-Reading Model using 3D-CNN & LSTM. i-manager’s Journal on Software Engineering, 18(3), 32-42. https://doi.org/10.26634/jse.18.3.20576

References

[5]. Duchnowski, P., Meier, U., & Waibel, A. (1994, September). See me, hear me: Integrating automatic speech recognition and lip-reading. In International Conference on Spoken Language Processing (ICSLP), 94, 547-550.
[7]. Fenghour, S., Chen, D., Guo, K., Li, B., & Xiao, P. (2021). Deep learning-based automated lip-reading: A survey. IEEE Access, 9, 121184-121205.
[13]. Margam, D. K., Aralikatti, R., Sharma, T., Thanda, A., Roy, S., & Venkatesan, S. M. (2019). LipReading with 3D- 2D-CNN BLSTM-HMM and word-CTC models. arXiv preprintarXiv:1906.12170.
[15]. Prajwal, K. R., Afouras, T., & Zisserman, A. (2022). Sub-word level lip reading with visual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5162-5172).
[16]. Tang, Y. F., & Zhang, Y. S. (2009, August). Design and implementation of college student information management system based on web Services. In 2009 IEEE International Symposium on IT in Medicine & Education, 1, 1044-1048. IEEE.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.