Multi-Lingual Character Recognition and Extraction using Recurrent Neural Networks

Neerugatti Varipally Vishwanath*, K. Manjunathachari**, K. Satya Prasad***
*,*** Department of Electronics and Communications Engineering, Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh, India.
** Department of Electronics and Communications Engineering, GITAM University, Hyderabad Campus, India.
Periodicity:October - December'2023
DOI : https://doi.org/10.26634/jip.10.4.20293

Abstract

In recent years, segmentation and recognition of multilingual languages have attracted the attention of many researchers. Multilingual Optical Character Recognition (OCR) technology uses tools like PyTesseract, OpenCV and Recurrent Neural Networks (RNN) to transform text in English, Telugu, Hindi, Tamil and Kannada. Converting text to digital format transforms communication and supports cultural understanding. The system supports multiple languages and can handle different languages. PyTesseract and OpenCV are used for accurate behavior recognition, while RNN improves language understanding. To ensure accuracy, the system uses advanced techniques to overcome problems such as noise and distortion in data input. This technology, combined with advanced OCR algorithms, improves text recognition and makes it adaptable to multilingual environments. This study highlights the importance of multilingual OCR in preserving language, supporting international cooperation, and encouraging participation in the digital age. The research explores ways to use cross-language grammar, fonts, and document layouts using previously implemented techniques to create informative content. RNN further improves the OCR process by capturing complex words. The userfriendly interface and integration with various platforms increase accessibility, allowing users to easily engage with multilingual content. Therefore, multilingual OCR, which combines PyTesseract, OpenCV, RNN, and other advanced techniques, is used to overcome speech problems, handle various grammars and input data, and have a positive impact on the development of OCR technology. This research helps create a globally connected society where knowledge is transmitted across language boundaries, fostering cultural exchange and fostering growth, while ensuring a good and accurate understanding of literature.

Keywords

Recognition, Segmentation, Extraction, Accuracy, Recurrent Neural Networks, Character Extraction, Language-agnostic Models.

How to Cite this Article?

Vishwanath, N. V., Manjunathachari, K., and Prasad, K. S. (2023). Multi-Lingual Character Recognition and Extraction using Recurrent Neural Networks. i-manager’s Journal on Image Processing, 10(4), 1-11. https://doi.org/10.26634/jip.10.4.20293

References

[2]. Amara, M., Zidi, K., Ghedira, K., & Zidi, S. (2016). New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. International Conference on Hybrid Intelligent Systems, (pp. 167-176). Springer International Publishing.
[3]. Anupama, N., Rupa, C., & Reddy, E. S. (2013). Character segmentation for Telugu image document using multiple histogram projections. Global Journal of Computer Science and Technology Graphics & Vision, 13(5), 11-15.
[6]. Bahashwan, M. A., Abu-Bakar, S. A., & Sheikh, U. U. (2017). Efficient segmentation of arabic handwritten characters using structural features. The International Arab Journal of Information Technology, 14(6), 870-879.
[9]. Brownlee, J. (2019). Understand the Impact of Learning Rate on Neural Network Performance. Machine Learning Mastery.
[12]. Jelodar, M. S., Fadaeieslam, M. J., Mozayani, N., & Fazeli, M. (2005, February). A persian OCR system using morphological operators. In Proceedings of World Academy of Science, Engineering and Technology, 4, 137-140.
[13]. Kadam, D., Chavan, P., & Pandhara, P. (2018). Literature survey on recognition and evaluation of Optical Character Recognition (OCR). International Journal of Scientific & Engineering Research, 9(2), 72-75.
[14]. Kaur, A., Baghla, S., & Kumar, S. (2015). Study of various character segmentation techniques for handwritten off-line cursive words: A review. International Journal of Advances in Science Engineering and Technology, 3(3), 154-158.
[15]. Kumar, R., & Ravulakollu, K. K. (2014). Offline handwritten DEVNAGARI digit recognition. ARPN Journal of Engineering and Applied Sciences, 9(2), 109-115.
[17]. Mamatha, H. R., & Srikantamurthy, K. (2012). Morphological operations and projection profiles based segmentation of handwritten Kannada document. International Journal of Applied Information Systems (IJAIS), 4(5), 13-19.
[26]. Shaikh, N. A., Mallah, G. A., & Shaikh, Z. A. (2009). Character segmentation of Sindhi, an Arabic style scripting language, using height profile vector. Australian Journal of Basic and Applied Sciences, 3(4), 4160-4169.
[27]. Shakunthala, B. S., & Pillai, C. S. (2019). Unconstrained handwritten text line segmentation for Kannada language. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(12), 953-956.
[33]. Urolagin, S., Prema, K. V., & Subba Reddy, N. V. (2012, April). Document image segmentation for Kannada script using zone based projection profiles. In International Conference on Advances in Information Technology and Mobile Communication (pp. 137-142). Berlin, Heidelberg: Springer Berlin Heidelberg.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.