i-manager's Journal on Computer Science (JCOM)


Volume 4 Issue 3 September - November 2016

Research Paper

Data Mining Model Based on Interval-valued Clustering

M. Bhargavi*
Assistant Professor, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
Bhargavi, M. (2016). Data Mining Model Based on Interval-Valued Clustering. i-manager’s Journal on Computer Science, 4(3), 1-10. https://doi.org/10.26634/jcom.4.3.8283

Abstract

Discovering potential patterns from complex data is a hot research topic. In this paper, the author proposes an iterative data mining model based on "Interval-Value" clustering, "Interval-Interval" clustering, and "Interval-Matrix" clustering. "Interval-Value" clustering uses the features of interval data and digital threshold and designed by "Netting"→ "Type-I clustering"→"Type-II clustering"; "Interval-Interval" clustering uses the features of interval data and interval threshold and designed with interval medium clustering; "Interval-Matrix" clustering uses the features of interval data and matrix threshold and designed by matrix threshold clustering. Motivation of the author is to mine the interval-valued association rules for giving dataset, and the experimental study is conducted to verify the new data mining method. Experimental results show that the data mining model based on interval-valued clustering is feasible and effective.

Research Paper

A Proficient Approach for Facsimile Detection

M.Sreelekha* , K. Bhaskar Naik**
* PG Scholar, Department of Computer Science and Engineering, SreeVidyanikethan Engineering College, JNTU Ananthapur, Tirupati, India.
** Assistant Professor, Department of Computer Science and Engineering, SreeVidyanikethan Engineering College, JNTU Ananthapur, Tirupati, India.
Sreelekha, M., and Naik, K.B. (2016). A Proficient Approach for Facsimile Detection. i-manager’s Journal on Computer Science, 4(3), 11-18. https://doi.org/10.26634/jcom.4.3.8285

Abstract

Now-a-days accuracy of databases is much more important, as it is primary and crucial for maintaining a database in current IT-based economy and also several organizations rely on the databases for carrying out their day-to-day operations. Consequently, much study on duplicate detection can also be named as entity resolution or facsimile recognition and by various names that focuses mainly on the pair selections increasing both the efficiency and recall. The process of recognizing multiple representations of the things with or in the same real world is named as Duplicate Detection. Among the indexing algorithms, Progressive duplicates detection algorithms is a novel approach whereby using the defined sorting key, sorts the given dataset, and compares the records within a window. So in order to get even faster results, than the traditional approaches, a new algorithm has been proposed combining the progressive approaches with the scalable approaches to progressively find the duplicates in parallel. This algorithm also proves that without losing the effectiveness during limited time of execution, maximizes the efficiency for finding duplicates.

Research Paper

A Greedy based Algorithm for the Interval Scheduling Problem

Ruwanthini Siyambalapitiya*
Lecturer, Department of Statistics & Computer Science, University of Peradeniya, Srilanka.
Siyambalapitiya, R. (2016). A Greedy Based Algorithm for the Interval Scheduling Problem. i-manager’s Journal on Computer Science, 4(3), 19-23. https://doi.org/10.26634/jcom.4.3.8286

Abstract

In this paper, we consider the interval scheduling problem which is a restricted version of the general scheduling problem. In the general scheduling problem, a given set of jobs need to be processed by a number of machines or processors so as to optimize a certain criterion. We assume that the processing times of the jobs are given. When we impose the additional requirement of starting time for each job, the scheduling problem is known as the interval scheduling problem. In the basic interval scheduling problem, each machine can process at most one job at a time and each machine is continuously available. Each job should be processed until completion, without interruption. The objective is to process all the jobs with a minimum number of machines. Various types of interval scheduling problems arise in computer science, telecommunications, crew scheduling, and in other areas. We propose a greedy based algorithm to solve the basic interval scheduling problem. We also compute a lower bound for the minimum number of machines or processors. Then we apply the algorithm to sets of data for problems of various sizes and show that the solutions obtained are close to optimal.

Research Paper

Simple Learner Based Polarity Prediction on Twitter Data

Megha Mishra* , Dolly Khandelwal**, Vishnu Kumar Mishra***
* Senior Assistant Professor, Department of Computer Science and Engineering, SSTC (SSGI), CSVTU, Bhilai, Chattisgarh, India.
** PG Scholar, Department of Computer Science and Engineering, SSTC (SSGI), CSVTU, Bhilai, Chattisgarh, India.
*** Associate Professor, Bharti College of Engineering and Technology, Durg, Chattisgarh, India.
Mishra, M., Khandelwal, D., and Mishra, V.K. (2016). Simple Learner Based Polarity Prediction On Twitter Data. i-manager’s Journal on Computer Science, 4(3), 24-28. https://doi.org/10.26634/jcom.4.3.8287

Abstract

Sentiment analysis is the process of extracting, understanding and analyzing the opinions expressed by the people using machine learning. In recent years, the increase in social networking sites has brought a new way of collecting information worldwide. Twitter is the famous microblogging site which allows millions of users to share their opinions on a wide variety of topics on daily basis. The posts are called tweets which are confined to 140 words. These opinions are important for researchers for analysis and efficient decision making. So, Sentiment analysis helps to extract the clear insight from social media. In this paper, the authors have presented an approach that classifies the sentences into categories as positive, negative, or neutral. For polarity classification, three multidimensional fields used are politics, companies, and entertainment. These fields give a better reflection of what is happening around the world. The dataset is extracted from twitter using the Twitter API. The API is created using the tweepy – the python library. The machine learning classifiers used are Naive Bayes, Baseline, and Maximum Entropy. The feature extraction is done using the unigram approach and the performance of different classifiers are compared.

Review Paper

An era of Enhanced Subspace Clustering in High-Dimensional data

Dr. J. Rama Devi* , M. Venkateswara Rao**
* Research Scholar, Department of Computer Science and Engineering, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India.
** Professor, Department of Information Technology, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India.
Devi, J.R., and Rao, M.V. (2016). An Era of Enhanced Subspace Clustering in High-Dimensional Data. i-manager’s Journal on Computer Science, 4(3), 28-36. https://doi.org/10.26634/jcom.4.3.8289

Abstract

In many real world problems, data are collected in high dimensional space. Detecting clusters in high dimensional spaces are a challenging task in the data mining problem. Subspace clustering is an emerging method, which instead of finding clusters in the full space, it finds clusters in different subspace of the original space. Subspace clustering has been successfully applied in various domains. Recently, the proliferation of high-dimensional data and the need for quality clustering results have moved the research era to enhanced subspace clustering, which targets on problems that cannot be handled or solved effectively through traditional subspace clustering. These enhanced clustering techniques involves in handling the complex data and improving clustering results in various domains like social networking, biology, astronomy and computer vision. The authors have reviewed on the enhanced subspace clustering paradigms and their properties. Mainly they have discussed three main problems of enhanced subspace clustering, first: overlapping clusters mined by significant subspace clusters. Second: overcome the parameter sensitivity problems of the state-of-the-art subspace clustering algorithms. Third: incorporate the constraints or domain knowledge that can make to improve the quality of clusters. They also discuss the basic subspace clustering, the relevant high-dimensional clustering approaches, and describes how they are related.