Clustering Of Summarizing Multi-Documents (Large Data) By Using MapReduce Framework

JCC_V3_N1_RP1 Clustering Of Summarizing Multi-Documents (Large Data) By Using MapReduce Framework K.Thirumalesh Srinivasulu Asadi Journal on Cloud Computing 2350-1308 3 1 1 12 Summarizing Large Text, Semantic Similarity, LDA (Latent Dirichlet Allocation), K-means, Clustering Based Summarization, Big Text Data Analysis Multi document summarization differs from the single document. Issues of compression, speed, and redundancy and passage selection are critical in the form of useful summaries. A collection of different documents is given to a variety of summarization methods based on different strategies to extract the most important sentences from the original document. LDA (Latent Dirichlet Allocation) topic modeling technique is used to divide the documents topic wise for summarizing the large text collection over the MapReduce framework. Compression ratio, retention ratio, Rouge and Pyramid score are different summarization parameters used to measure the performance of the summarizing documents. Semantic similarity and clustering methods are used efficiently for generating the summary of large text collections from multiple documents. Summarizing multi documents is a time consuming problem and it is a basic tool for understanding the summary. The presented method is compared with the MapReduce framework based k-means clustering algorithm applied on Four Multi-document summarization methods. Support for multilingual text summarization is provided over the MapReduce framework in order to provide the summary generation from the text document collections available in different languages. November 2015 - January 2016 Copyright © 2016 i-manager publications. All rights reserved. i-manager Publications http://www.imanagerpublications.com/Article.aspx?ArticleId=8073