i-manager Publications

Multi-Lingual Character Recognition and Extraction using Recurrent Neural Networks

Neerugatti Varipally Vishwanath*, K. Manjunathachari**, K. Satya Prasad***

*,*** Department of Electronics and Communications Engineering, Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh, India.

** Department of Electronics and Communications Engineering, GITAM University, Hyderabad Campus, India.

Periodicity:October - December'2023
DOI : https://doi.org/10.26634/jip.10.4.20293

Abstract

In recent years, segmentation and recognition of multilingual languages have attracted the attention of many researchers. Multilingual Optical Character Recognition (OCR) technology uses tools like PyTesseract, OpenCV and Recurrent Neural Networks (RNN) to transform text in English, Telugu, Hindi, Tamil and Kannada. Converting text to digital format transforms communication and supports cultural understanding. The system supports multiple languages and can handle different languages. PyTesseract and OpenCV are used for accurate behavior recognition, while RNN improves language understanding. To ensure accuracy, the system uses advanced techniques to overcome problems such as noise and distortion in data input. This technology, combined with advanced OCR algorithms, improves text recognition and makes it adaptable to multilingual environments. This study highlights the importance of multilingual OCR in preserving language, supporting international cooperation, and encouraging participation in the digital age. The research explores ways to use cross-language grammar, fonts, and document layouts using previously implemented techniques to create informative content. RNN further improves the OCR process by capturing complex words. The userfriendly interface and integration with various platforms increase accessibility, allowing users to easily engage with multilingual content. Therefore, multilingual OCR, which combines PyTesseract, OpenCV, RNN, and other advanced techniques, is used to overcome speech problems, handle various grammars and input data, and have a positive impact on the development of OCR technology. This research helps create a globally connected society where knowledge is transmitted across language boundaries, fostering cultural exchange and fostering growth, while ensuring a good and accurate understanding of literature.

Keywords

Recognition, Segmentation, Extraction, Accuracy, Recurrent Neural Networks, Character Extraction, Language-agnostic Models.

How to Cite this Article?

Vishwanath, N. V., Manjunathachari, K., and Prasad, K. S. (2023). Multi-Lingual Character Recognition and Extraction using Recurrent Neural Networks. i-manager’s Journal on Image Processing, 10(4), 1-11. https://doi.org/10.26634/jip.10.4.20293

References

[1]. Al-Nabhi, H., Krishna, K. L., & Shareef, D. A. A. A. (2022). Efficient CRNN recognition approaches for defective characters in images. International Journal of Computing and Digital Systems, 12(1), 1417-1427.

[2]. Amara, M., Zidi, K., Ghedira, K., & Zidi, S. (2016). New rules to enhance the performances of histogram projection for segmenting small-sized Arabic words. International Conference on Hybrid Intelligent Systems, (pp. 167-176). Springer International Publishing.

[3]. Anupama, N., Rupa, C., & Reddy, E. S. (2013). Character segmentation for Telugu image document using multiple histogram projections. Global Journal of Computer Science and Technology Graphics & Vision, 13(5), 11-15.

[4]. Anwar, K., & Nugroho, H. (2015, December). A segmentation scheme of arabic words with harakat. In 2015 IEEE International Conference on Communication, Networks and Satellite (COMNESTAT) (pp. 111-114). IEEE.

[5]. Aradhya, V. M., Kumar, G. H., & Noushath, S. (2008). Multilingual OCR system for South Indian scripts and English documents: An approach based on Fourier transform and principal component analysis. Engineering Applications of Artificial Intelligence, 21(4), 658-668.

[6]. Bahashwan, M. A., Abu-Bakar, S. A., & Sheikh, U. U. (2017). Efficient segmentation of arabic handwritten characters using structural features. The International Arab Journal of Information Technology, 14(6), 870-879.

[7]. Bhattacharya, U., & Chaudhuri, B. B. (2008). Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(3), 444-457.

[8]. Brink, A. A., Smit, J., Bulacu, M. L., & Schomaker, L. R. (2012). Writer identification using directional ink-trace width measurements. Pattern Recognition, 45(1), 162-171.

[9]. Brownlee, J. (2019). Understand the Impact of Learning Rate on Neural Network Performance. Machine Learning Mastery.

[10]. Chacko, B. P., Vimal Krishnan, V. R., Raju, G., & Babu Anto, P. (2012). Handwritten character recognition using wavelet energy and extreme learning machine. International Journal of Machine Learning and Cybernetics, 3, 149-161.

[11]. Islam, N., Islam, Z., & Noor, N. (2016). A Survey on optical character recognition system. Journal of Information & Communication Technology, 10(2), 1-4.

[12]. Jelodar, M. S., Fadaeieslam, M. J., Mozayani, N., & Fazeli, M. (2005, February). A persian OCR system using morphological operators. In Proceedings of World Academy of Science, Engineering and Technology, 4, 137-140.

[13]. Kadam, D., Chavan, P., & Pandhara, P. (2018). Literature survey on recognition and evaluation of Optical Character Recognition (OCR). International Journal of Scientific & Engineering Research, 9(2), 72-75.

[14]. Kaur, A., Baghla, S., & Kumar, S. (2015). Study of various character segmentation techniques for handwritten off-line cursive words: A review. International Journal of Advances in Science Engineering and Technology, 3(3), 154-158.

[15]. Kumar, R., & Ravulakollu, K. K. (2014). Offline handwritten DEVNAGARI digit recognition. ARPN Journal of Engineering and Applied Sciences, 9(2), 109-115.

[16]. Mahmoud, S. A., & Mahmoud, A. S. (2009). The use of Hartley transform in OCR with application to printed Arabic character recognition. Pattern Analysis and Applications, 12, 353-365.

[17]. Mamatha, H. R., & Srikantamurthy, K. (2012). Morphological operations and projection profiles based segmentation of handwritten Kannada document. International Journal of Applied Information Systems (IJAIS), 4(5), 13-19.

[18]. Mathew, M., Singh, A. K., & Jawahar, C. V. (2016, April). Multilingual OCR for indic scripts. In 2016 12th IAPR Workshop on Document Analysis Systems (DAS) (pp. 186-191). IEEE.

[19]. Mohammad, K., Qaroush, A., Ayesh, M., Washha, M., Alsadeh, A., & Agaian, S. (2019). Contour-based character segmentation for printed Arabic text with diacritics. Journal of Electronic Imaging, 28(4), 043030-043030.

[20]. Mousa, M. A., Sayed, M. S., & Abdalla, M. I. (2017). Arabic Character Segmentation Using Projection Based Approach with Profile's Amplitude Filter. arXiv preprint.

[21]. Prasad, J. R., & Kulkarni, U. (2015). Gujrati character recognition using weighted k-NN and mean χ 2 distance measure. International Journal of Machine Learning and Cybernetics, 6, 69-82.

[22]. Rahman, A., & Verma, B. (2013). Effect of ensemble classifier composition on offline cursive character recognition. Information Processing & Management, 49(4), 852-864.

[23]. Raju, G., Moni, B. S., & Nair, M. S. (2014). A novel handwritten character recognition system using gradient based features and run length count. Sadhana, 39, 1333-1355.

[24]. Sahare, P., & Dhok, S. B. (2017). Script identification algorithms: A survey. International Journal of Multimedia Information Retrieval, 6, 211-232.

[25]. Shahraki, A. A., Ghahnavieh, A. E., & Mirmahdavi, S. A. (2014, March). A morphological approach to persian handwritten text line segmentation. In 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation (pp. 298-301). IEEE.

[26]. Shaikh, N. A., Mallah, G. A., & Shaikh, Z. A. (2009). Character segmentation of Sindhi, an Arabic style scripting language, using height profile vector. Australian Journal of Basic and Applied Sciences, 3(4), 4160-4169.

[27]. Shakunthala, B. S., & Pillai, C. S. (2019). Unconstrained handwritten text line segmentation for Kannada language. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(12), 953-956.

[28]. Shi, B., Bai, X., & Yao, C. (2016). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298-2304.

[29]. Shi, C. Z., Gao, S., Liu, M. T., Qi, C. Z., Wang, C. H., & Xiao, B. H. (2015). Stroke detector and structure based models for character recognition: A comparative study. IEEE Transactions on Image Processing, 24(12), 4952-4964.

[30]. Singh, P., Verma, A., & Chaudhari, N. S. (2015). Feature selection based classifier combination approach for handwritten Devanagari numeral recognition. Sadhana, 40, 1701-1714.

[31]. Surinta, O., Karaaba, M. F., Schomaker, L. R., & Wiering, M. A. (2015). Recognition of handwritten characters using local gradient feature descriptors. Engineering Applications of Artificial Intelligence, 45, 405-414.

[32]. Tian, S., Bhattacharya, U., Lu, S., Su, B., Wang, Q., Wei, X., & Tan, C. L. (2016). Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognition, 51, 125-134.

[33]. Urolagin, S., Prema, K. V., & Subba Reddy, N. V. (2012, April). Document image segmentation for Kannada script using zone based projection profiles. In International Conference on Advances in Information Technology and Mobile Communication (pp. 137-142). Berlin, Heidelberg: Springer Berlin Heidelberg.

Multi-Lingual Character Recognition and Extraction using Recurrent Neural Networks

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content:

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25