i-manager's Journal on Cloud Computing (JCC)


Volume 3 Issue 1 November - January 2016 [Open Access]

Research Paper

Clustering of Summarizing Multi-Documents (Large Data) by Using MapReduce Framework

K. Thirumalesh* , Srinivasulu Asadi**
* Research Scholar, Department of Information Technology, Sree Vidyanikethan Engineering College, Tirupathi, India.
** Associate Professor, Department of Information Technology, Sree Vidyanikethan Engineering College, Tirupathi, India
Thirumalesh, K., and Asadi, S. (2016). Clustering Of Summarizing Multi-Documents (Large Data) By Using MapReduce Framework. i-manager’s Journal on Cloud Computing.,3(1), 1-12.

Abstract

Multi document summarization differs from the single document. Issues of compression, speed, and redundancy and passage selection are critical in the form of useful summaries. A collection of different documents is given to a variety of summarization methods based on different strategies to extract the most important sentences from the original document. LDA (Latent Dirichlet Allocation) topic modeling technique is used to divide the documents topic wise for summarizing the large text collection over the MapReduce framework. Compression ratio, retention ratio, Rouge and Pyramid score are different summarization parameters used to measure the performance of the summarizing documents. Semantic similarity and clustering methods are used efficiently for generating the summary of large text collections from multiple documents. Summarizing multi documents is a time consuming problem and it is a basic tool for understanding the summary. The presented method is compared with the MapReduce framework based k-means clustering algorithm applied on Four Multi-document summarization methods. Support for multilingual text summarization is provided over the MapReduce framework in order to provide the summary generation from the text document collections available in different languages.

Research Paper

A Methodology for WebLog Data analysis using HadoopMapReduce and PIG

Durga Prasad P S* , T. Vivekanandan**, A.Srinivasan***
* PG Scholar, Department of Computer Science and Engineering, SITAMS, Chittor, Andhra Pradesh, India.
**-*** Associate Professor, Department of Computer Science and Engineering, SITAMS, Chittor, Andhra Pradesh, India.
Prasad, P. S. D., Vivekanandan, T., and Srinivasan, A. (2016). A Methodology for WebLog Data analysis using HadoopMapReduce and PIG. i-manager’s Journal on Cloud Computing, 3(1), 13-17.

Abstract

In the recent time, world is severely facing the problem related to the data storage and processing. Especially, the size of weblog data is exponentially increasing in terms of petabytes and zettabytes. The dependency of weblog data shows its importance on the users' actions on web. To solve and improve the business in all aspects, web data is prominent and hence it is vital. The traditional data management system is not adequate to handle the data in very large size. The Map Reduce programming approach is introduced to deal with the large data processing. In this paper, the authors have proposed a large scale data processing system for analysing web log data through MapReduce programming in Hadoop framework using Pig script. The experimental results show the processing time for classification of different status code in the web log data is efficient, than the traditional techniques.

Research Paper

An Effective Feature Selection Technique for Mining High Dimensional Data on Bigdata

K. Bhaskar Naik* , S.P Sindhuja**
* Assistant Professor, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
** PG Scholar, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
Naik, K. B., and Sindhuja, S. P. (2016). An Effective Feature Selection Technique for Mining High Dimensional Data on Bigdata. i-manager’s Journal on Cloud Computing, 3(1), 18-23.

Abstract

In the recent years, many research innovations have come into foray in the area of big data analytics. Advanced analysis of the big data stream is bound to become a key area of data mining research as the number of applications requiring such processing increases. Big data sets are now collected in many fields eg., Finance, business, medical systems, internet and other scientific research. Data sets rapidly increase their size as they are often generated in the form of incoming stream. Feature selection has been used to lighten the processing load in inducing a data mining model, but mining a high dimensional data becomes a tough task due to its exponential growth of size. This paper aims to compare the two algorithms, namely Particle Swarm Optimization and FAST algorithm in the feature selection process. The proposed algorithm FAST is used in order to reduce the irrelevant and redundant data, while streaming high dimensional data which would further increase the analytical accuracy for a reasonable processing time.

Research Paper

Enhanced E-tree for Mining High Dimensional Data

S. Salam* , M. Roja**, T. V. Rao***
* Associate Professor, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
** PG Scholar, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
*** Professor, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
Salam, S., Roja, M., and Rao, T. V. (2016). Enhanced E-tree for Mining High Dimensional Data. i-manager’s Journal on Cloud Computing, 3(1), 24-29.

Abstract

Data Stream classification is one of the critical tasks in data mining. At the point when DataStream touches the base at a pace of GB/sec, we need to recognize spam, web observing and capacity. It is a troublesome operation and falls flat in the existing System. Actualizing two Algorithms namely, E-tree Algorithm (Ensemble-tree) and Avaricious Algorithm and Executing E-tree algorithm, the authors have maintained a strategic distance from the existing issues. Ensemble tree (Etree) takes care of extensive volumes of stream data and drifting. E-tree, Classifies and groups the Data Stream and stores the data effectively. Furthermore, foresee web checking and spam identification precisely. Controlling the web movement, the authors have actualized the greedy algorithm.

Review Paper

A Survey on Energy Aware Job Scheduling Algorithms in Cloud Environment

Shaik Naseera* , P. Jyotheeswai**
* Associate Professor, Department of Computing Science and Engineering, VIT University, Vellore, India.
** Associate Professor, Department of Computing Science and Engineering, SVCET, Chittoor, India.
Naseera, S., and Jyotheeswai, P. (2016). A Survey on Energy Aware Job Scheduling Algorithms in Cloud Environment. i-manager’s Journal on Cloud Computing, 3(1), 30-36.

Abstract

Now-a-days there is a lot of attention to cloud computing by the Research community. Cloud computing is a platform that supports the sharing of resources, communication and storage capacity over the internet. The primary benefit of moving to the Clouds is application scalability. It provides virtualized resources and are built on the base of Grid & distributed computing. Cloud computing is also environmental friendly framework. It benefits from the efficient utilization of resources and optimal scheduling algorithms. The growth of internet based applications demands the need for the development of algorithms that cope with the escalation in energy consumption and reduce the operational cost and emission of CO gases. In this paper, the authors present a review on energy aware job scheduling algorithms existing 2 in the literature. This paper helps the readers to understand the functionality and parameters focus of various energy aware scheduling algorithms available in the literature.