JSE_V5_N2_RP6
Semantic Extraction In PGF Using POS Tagged Data of Sanskrit
Smita Selot
Neeta Tripathi
A.S. Zadgaonkar
Journal on Software Engineering
2230 – 7168
5
2
37
43
Part of Speech Tagger, Panini Grammar Framework, Semantic Analysis, Semantic Role, Multiple Case Occurrence, Case Frames
Extraction of semantics from a text is vital application covering various field of artificial intelligence like natural language processing, knowledge representation, machine learning and so forth. Research work is being carried for languages in international platform like English, German, Chinese etc. In India also, research associated with Indian languages like Hindi, Tamil, Bengali and other regional languages is developing faster. In this paper, the researchers have emphasized upon the use of Sanskrit language for semantic extraction. Sanskrit, being an order free language with systematic grammar gives an excellent opportunity for extracting semantic with higher efficiency. Panini, an ancient grammarian has introduced six Karakas (cases) to identify the semantic role of word in a sentence. These karkas are analyzed and applied for semantic extraction from the Sanskrit text. Input sentences are first converted into syntactic structure, these syntactic structures are then used for semantic analysis. The syntactic structures are present in Part of Speech (POS) tagged form and various features of this tagged data is analyzed to extract semantic role of the word in a sentence. Finally, semantic label of each word of a sentence are stored in frames called case frames, which act as a knowledge representation tool. So mapping of input POS tagged data to semantic tagged data is done and case frames are generated. Such system are useful in building question-answer based applications, machine learning, knowledge representation and information retrieval.
October - December 2010
Copyright © 2010 i-manager publications. All rights reserved.
i-manager Publications
http://www.imanagerpublications.com/Article.aspx?ArticleId=1334