Historical Tamil Character Recognition based on Clustering

G. Prabhakaran*, R. Meera**
* Assistant Professor, Department of Computer Science and Engineering, E.G.S. Pillay Engineering College, Nagapattinam, India.
** PG Scholar, Department of Computer Science and Engineering, E.G.S. Pillay Engineering College, Nagapattinam, India.
Periodicity:December - February'2017
DOI : https://doi.org/10.26634/jit.6.1.13505

Abstract

A novel method using clustering algorithm is proposed to recognize the tamil characters in a given document. Many algorithms and processing techniques exist, which are used only for certain languages and hard specific file formats. This does not involve any pre-processing on the documents like contrast adjustments or filtering of noises on the image. Considering all these negativities, a novel method is proposed in this project, where the input can be of any file type which undergoes pre-processing like contrast adjustment before applying the procedure. Above all, this method is used to replace the historical Tamil words in the earlier Tamil documents to the words corresponding to them that are in use today.

Keywords

Image Processing (IP), Document Image Binarization Contest (DIBCO), Optical Character Recognition (OCR), Post-processing Step (PS), K-Nearest Neighbour (KNN)

How to Cite this Article?

G. Prabhakaran and R. Meera (2017). Historical Tamil Character Recognition based on Clustering. i-manager’s Journal on Information Technology , 6(1), 19-24. https://doi.org/10.26634/jit.6.1.13505

References

[1]. Abnikant Singh, Balachandar, T., Markandey Singh, Sarvesh Kumar, Seethalakshmi R., Sreeranjani T.R., and Ritwaj Ratan, (2005). “Optical Character Recognition for printed Tamil text using Unicode”. Journal of Zhejiang University Science, Vol. 6A, No.11, pp. 1297-1305.
[2]. K.H. Aparna, and V.S. Chakravarthy, (2003). “A complete OCR system development of Tamil Magazine Documents”. Tamil Internet 2003, Chennai, August, 22- 24, pp. 45-51.
[3]. Aparna K.H., Chakravarthy V.S., P. Krishnan, and Sumanth Jaganathan, (2002). “An optical Character Recognition System for Tamil Newsprint”. International Conference on Universal Knowledge and Language, pp. 881-886.
[4]. BBC, (2004). India sets up classical languages. Retrieved from http://news.bbc.co.uk/2/hi/south_asia/ 3667032.stm on August 17, 2004.
[5]. Chinnuswamy, P., and S.G. Krishnamoorthy, (1980). Recognition of Handprinted Tamil Characters”. Pattern Recognition, Vol. 12, No. 3, pp. 141-152.
[6]. D. Dhanya and A.G. Ramakrishnan, (2001). “Simultaneous Recognition of Tamil and oman Scripts”. Proc. Tamil Internet 2001, Kuala Lumpur, Aug 26-28, pp. 64-68.
[7]. Gatos B., Ntirogiannis K., and Pratikakis I., (2011). “ICDAR 2011 document image binarization contest (DIBCO 2011)”. In Proc. Int. Conf. Document Anal. Recognit., pp. 1506-1510.
[8]. Gatos B., Ntirogiannis K., and Pratikakis I., (2010). “HDIBCO 2010 handwritten document image binarization competition”. In Proc. Int. Conf. Frontiers Handwrit. Recognit., pp. 727–732.
[9]. Gordon, Raymond G., Jr. (ed.), (2005). Ethnologue: Languages of the World, Fifteenth edition. Dallas, Tex.: SIL International.
[10]. Hani Khasawneh, (2006). “A New Algorithm for Arabic Optical Character Recognition”. Proceedings of the 5th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Databases, Madrid, Spain, February 15- 17, pp. 211-224.
[11]. Shivsubramani, K., Loganathan, R., Srinivasan, C. J., Ajay, V., and Soman, K. P. (2007). “Multiclass hierarchical SVM for recognition of printed Tamil Characters”. TC, Vol. 2, p. 2.
[12]. Lu S., Su B., and Tan C. L., (2010a). “Binarization of historical handwritten document images using local maximum and minimum filter”. In Proc. Int. Workshop Document Anal. Syst., pp. 159–166.
[13]. Lu S., Su B., and Tan C.L., (2010b). “Document image binarization using background estimation and stroke edges”. Int. J. Document Anal. Recognit., Vol. 13, No. 4, pp. 303–314.
[14]. B.M. Sagar, Shobha G., and Ramakanth Kumar P., (2008). "OCR for printed Kannada text to Machine editable format using Database approach". WSEAS Transactions on Computers, Vol. 7, No. 6, pp. 766-769.
[15]. Sankur B., and Sezgin M., (2004). “Survey over image thresholding techniques and quantitative performance evaluation”. J. Electron. Imag., Vol. 13, No. 1, pp. 146–165.
[16]. R.M. Suresh, S., Arumugam, and K.P. Aravanan, (2000). “Recognition of handwritten Tamil characters using fuzzy classificatory approach”. Proc. of the Tamil Internet 2000 Conference, Singapore.
[17]. The Hindu, (2005). Sanskrit to be declared classical language. Retrieved on 2007-08-16 from http://www.hindu.com/2005/10/28/stories/20051028092 81200.htm
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.