Historical Tamil Character Recognition based on Clustering

JIT_V6_N1_RP3 Historical Tamil Character Recognition based on Clustering G. Prabhakaran R. Meera Journal on Information Technology 2277-5250 6 1 19 24 Image Processing (IP), Document Image Binarization Contest (DIBCO), Optical Character Recognition (OCR), Post-processing Step (PS), K-Nearest Neighbour (KNN) A novel method using clustering algorithm is proposed to recognize the tamil characters in a given document. Many algorithms and processing techniques exist, which are used only for certain languages and hard specific file formats. This does not involve any pre-processing on the documents like contrast adjustments or filtering of noises on the image. Considering all these negativities, a novel method is proposed in this project, where the input can be of any file type which undergoes pre-processing like contrast adjustment before applying the procedure. Above all, this method is used to replace the historical Tamil words in the earlier Tamil documents to the words corresponding to them that are in use today. December 2016 – February 2017 Copyright © 2017 i-manager publications. All rights reserved. i-manager Publications http://www.imanagerpublications.com/Article.aspx?ArticleId=13505