Separation, Classification and Expert Mapping of Old Grantha Documents Symbols

Lalit Prakash Saxena*
* Research Scientist, Applied Research Section, Combo Consultancy, Obra UP India.
Periodicity:December - February'2019
DOI : https://doi.org/10.26634/jpr.5.4.16108

Abstract

This paper attempts to decipher old documents using symbol to script mapping scheme. Symbols are confined to documents either as isolated notations or handwritten texts with a number of not able features. This paper describes a method to separate and classify handwritten non-cursive symbols in Grantha script. This work uses statistical correlation coefficient method for separation and classification, without the recognition of the symbols. The Grantha script symbols mapping model comprises of selection, separation, preprocessing, classification, and finally mapping. The proposed model employs bounding box algorithm for locating the symbols. The algorithm selects the symbols and excludes the non-symbol components to an extent possible. For experiments, 135 Grantha script document images of varying deteriorating complexities were used. The resulting symbol classification rate (i.e., the proportion of symbols automatically classified) was obtained near to 80%, aiding in mapping to a predetermined mapping scheme.

Keywords

Grantha script, Document images, Separation, Classification, Mapping.

How to Cite this Article?

Saxena, L. P. (2019). Separation, Classification and Expert Mapping of Old Grantha Documents Symbols. i-manager’s Journal on Pattern Recognition, 5(4), 51-67. https://doi.org/10.26634/jpr.5.4.16108

References

[1]. Arica, N., & Yarman-Vural, F. T. (2001). An overview of character recognition focused on off-line handwriting. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 31(2), 216-233. https://doi.org/10.1109/5326.941845
[2]. Baumann, S., Ali, M. B. H., Dengel, A., Jager, T., Malburg, M., Weigel, A., & Wenzel, C. (1997, August). Message extraction from printed documents-a complete solution. In Proceedings of the Fourth International Conference on Document Analysis and Recognition. Vol(2), 1055-1059. IEEE. https://doi.org/10.1109/ ICDAR.1997.620670
[3]. Casey, R. G., & Lecolinet, E. (1996). A survey of methods and strategies in character segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(7), 690-706 https://doi.org/10.1109/ 34.506792
[4]. Chan, R. H., Ho, C. W., & Nikolova, M. (2005). Salt-and- pepper noise removal by median-type noise detectors and detail-preserving regularization. IEEE Transactions on Image Processing, 14(10),1479. https://doi.org/10.1109/ TIP.2005.852196
[5]. Chaudhuri, B. B., Pal, U., & Mitra, M. (2002). Automatic recognition of printed Oriya script. Sadhana, 27(1), 23-34 https://doi.org/10.1007/BF02703310
[6]. Chou, K. S., Fan, K. C., & Fan, T. I. (1997). Peripheral and global features for use in coarse classification of Chinese characters. Pattern Recognition, 30(3), 483-489. https://doi.org/10.1016/S0031-3203(96)00090-8
[7]. Das, N., Sarkar, R., Basu, S., Saha, P. K., Kundu, M., & Nasipuri, M. (2015). Handwritten Bangla character recognition using a soft computing paradigm embedded in two pass approach. Pattern Recognition, 48 (6), 2054 - 2071. https://doi.org/10.1016/j.patcog.2014.12.011
[8]. Duda, R. O., Hart, P. E., & Stork, D. G. (2012). Pattern Classification (Vol. 2). John Wiley & Sons.
[9]. Fan, K. C., Wang, L. S., & Tu, Y. T. (1998). Classification of machine-printed and handwritten texts using character block layout variance. Pattern Recognition, 31(9), 1275-1284. https://doi.org/10.1016/S0031- 3203(97)00143-X
[10]. Fujisawa, H. (2008). Forty years of research in character and document recognition - an industrial perspective. Pattern Recognition, 41(8), 2435-2446. https://doi.org/10.1016/j.patcog.2008.03.015
[11]. Gatos, B., Pratikakis, I., & Perantonis, S. J. (2006). Adaptive degraded document image binarization. Pattern Recognition, 39 (3), 317-327. https://doi.org/10.1016/j.patcog.2005.09.010
[12]. Gonzalez, R. C., & Woods, R. E. (2002). Digital Image Processing. Publishing House of Electronics Industry, 141(7).
[13]. Haralick, R. M., & Shapiro, L. G. (1992). Computer and Robot Vision (Vol.1, pp.28-48) Reading: Addisonwesley.
[14]. Hu, J., Yu, D., & Yan, H. (1998, August). Structural boundary feature extraction for printed character recognition. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 500 - 507). Springer, Berlin, Heidelberg. https://doi.org/10.1007/ BFb0033272
[15]. Johnson, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32 (3), 241-254. https://doi.org/10.1007/BF02289588
[16]. Jung, K., Kim, K. I., & Jain, A. K. (2004). Text information extraction in images and video: A survey. Pattern Recognition, 37(5), 977-997. https://doi.org/ 10.1016/j.patcog.2003.10.012
[17]. Kumar, D. U. (2009). Traditional writing system in southern India - palm leaf manuscripts. Design Thoughts, 7, 2-7.
[18]. Kumar, M., Sharma, R. K., & Jindal, M. K. (2014). Efficient feature extraction techniques for offline handwritten Gurumukhi character recognition. National Academy Science Letters, 37 (4), 381-391. https://doi.org/10.1007/s40009-014-0253-4
[19]. Lam, L., Lee, S. W., & Suen, C. Y. (1992). Thinning methodologies-a comprehensive sur vey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (9), 869 - 885. https://doi.org/10.1109/34.161346
[20]. Lehal, G. S., & Singh, C. (1999). Feature extraction and classification for OCR of Gurmukhi script. Vivek- Bombay, 12(2), 2-12.
[21]. Liu, C. L. (2006, September). Handwritten chinese character recognition: Effects of shape normalization and feature extraction. In Summit on Arabic and Chinese Handwriting Recognition (pp. 104-128). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78199- 8_7
[22]. MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, No. 14, pp. 281-297).
[23]. Mezghani, N., Mitiche, A., & Cheriet, M. (2008). Bayes classification of online arabic characters by Gibbs modeling of class conditional densities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1121-1131. https://doi.org/10.1109/TPAMI.2007.70753
[24]. Mizukami, Y. (1998). A handwritten chinese character recognition system using hierarchical displacement extraction based on directional features. Pattern Recognition Letters, 19 (7), 595-604. https://doi.org/10.1016/S0167-8655(98)00034-8
[25]. Mori, S., Suen, C. Y., & Yamamoto, K. (1992). Historical review of OCR research and development. Proceedings of the IEEE, 80 (7), 1029 - 1058. https://doi.org/10.1109/5.156468
[26]. Nixon, M. S., & Aguado, A. S.(2008). Feature Extraction and Image Processing (2nd Ed).Academic Press,ISBN 978-0-12-372538-7, (2 ed.). Elsevier Ltd., London, UK.
[27]. Pal, U., & Chaudhuri, B. B. (2004). Indian script character recognition: A survey. Pattern Recognition, 37 (9), 1887 - 1899. https://doi.org/10.1016/j.patcog.2004.02.003
[28]. Plamondon, R., & Srihari, S. N. (2000). Online and offline handwriting recognition: A comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. https://doi.org/10.1109/ 34.824821
[29]. Rocha, J., & Pavlidis, T. (1995). Character recognition without segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(9), 903- 909. https://doi.org/10.1109/34.406657
[30]. Russ, J. C. (2007). The Image Processing Handbook, CRC. Boca Raton, FL.
[31]. Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. https://doi.org/10.1016/S0031-3203(99)00055-2
[32]. Saxena, L. P. (2014). An effective binarization method for readability improvement of stain-affected (degraded) palm leaf and other types of manuscripts. Current Science, 489-496.
[33]. Serra, J. (1982). Image Analysis and Mathematical Morphology, (Vol.1).
[34]. Shelke, S., & Apte, S. (2015, January). A fuzzy based classification scheme for unconstrained handwritten Devanagari character recognition. In 2015 International Conference On Communication, Information & Computing Technology (ICCICT) (pp. 1-6). IEEE. https://doi.org/10.1109/ICCICT.2015.7045738
[35]. Sonka, M., Hlavac, V., & Boyle, R. (2007). Image Processing, Analysis, and Machine Vision. Thomson- Engineering.
[36]. Srinivasan, S., Joseph, J. V. M., & Harikumar, P. (2012). Indus script deciphered: the method of semblance at work. Current Science, 268-281.
[37]. Vamvakas, G., Gatos, B., & Perantoni, S. J. (2009, July). A novel feature extraction and classification Methodology for the recognition of historical documents. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition (pp. 491-495). IEEE Computer Society. https://doi.ieeecomputer society.org/10.1109/ICDAR.2009.223
[38]. Vamvakas, G., Gatos, B., Petridis, S., & Stamatopoulos, N. (2007, September). An efficient feature extraction and dimensionality reduction scheme for isolated greek handwritten character recognition. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) (Vol. 2, pp. 1073-1077). IEEE. https://doi.org/10.1109/ICDAR.2007.4377080
[39]. Van Rijsbergen, C. J. (1979). Information Retrieval nd (2 Ed.). Newton, MA.
[40]. Venugopalan, K. (1983). A Primer in Grantha Characters (2nd Ed). St. Peter, Minn: James H. Nye.
[41]. Wakahara, T., & Yamashita, Y. (2012, September). k- NN Classification of handwritten characters via accelerated GAT correlation. In 2012 International Conference on Frontiers in Handwriting Recognition (ICFHR 2012) (pp. 143-148). IEEE. https://doi.org/ 10.1016/j.patcog.2013.05.005
[42]. Wang, K., & Kangas, J. A. (2003). Character location in scene images from digital camera. Pattern Recognition, 36(10), 2287-2299. https://doi.org/ 10.1016/S0031-3203(03)00082-7
[43]. Wen, M. G., Fan, K. C., & Han, C. C. (2004). Classification of chinese characters using pseudo skeleton features. J. Inf. Sci. Eng., 20(5), 903-922.
[44]. Yin, F., Wang, Q. F., & Liu, C. L. (2013). Transcript mapping for handwritten chinese documents by integrating character recognition model and geometric context. Pattern Recognition, 46(10), 2807-2818. https://doi.org/10.1016/j.patcog.2013.03.013
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.