i-manager Publications

A Hybrid Clustering with Side Information in Text Mining

T. Naveen Kumar*, Ramadevi**

* PG Scholar, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.

** Assistant Professor, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.

Periodicity:June - August'2016
DOI : https://doi.org/10.26634/jcom.4.2.8122

Abstract

In many online forms, lots of side-data or meta information is available. This Meta data consists of different kinds, for example the links present in the file, the user-access performance from blogs, the document origin information, and also other attributes which are surrounded into the content or text document. For the clustering purposes, these Meta attributes contain large amount of information. The Meta data adds the noise to the mining process. So, it is difficult to incorporate into this process. The existing COATES algorithm is created for clustering approach. But, in COATES the kmeans algorithm creates some problems as it is unable to get the quality of clusters better. Because, it leads to the wrong number of clusters, different sized clusters, and empty clusters and outliers. The authors have proposed a Hybrid-COATES algorithm which combines CURE with COATES algorithm for an efficiency and effective clustering approach. To mine text data with the help of Meta data or side information, CURE algorithm is more capable than kmeans algorithm. Hybrid-COATES method is used to attempt to the scalability problem and improve the quality of clustering results.

Keywords

Data Mining, Clustering, COATES Algorihm, K-means Algorithm, Hybrid-COATES.

How to Cite this Article?

Kumar, T.N., and Ramadevi (2016). A Hybrid Clustering With Side Information In Text Mining. i-manager’s Journal on Computer Science, 4(2), 23-30. https://doi.org/10.26634/jcom.4.2.8122

References

[1]. Charu C. Aggarwal, Yuchen Zhao, and Philip S. Yu, (2014). “On the Use of Side Information for Mining Text Data”. IEEE Transactions on Knowledge and Engineering, Vol.26, No.6, pp.1415-1429.

[2]. Jain, M. Murty, and P. Flynn, (1998). “Data Clustering: A Review”. ACM Comput. Survey, Vol.31, No.3, pp.264-323.

[3]. Guha, R. Rastogi, and K. Shim, (1998). “CURE: An Efficient Clustering Algorithm for Large Databases”. Proceeding of ACM Int'l Conference on Management of Data, pp.73-84.

[4]. M. Livny, R. Ramakrishna and T. Zhang, (1996). “BIRCH: An Efficient Clustering Method for Very Large Databases”. Proceeding ACMSIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp.103-114.

[5]. Danyang Cao, and Bingru Yang, (2010). “An nd improved k-medoids Clustering Algorithm”. The 2 International Conference on Computer and Automation Engineering (ICCAE).

[6]. Shi Na, Liu Xumin and Guan Yong, (2010). “Research on k- means Clustering Algorithm: An Improved k- means Clustering Algorithm”. Intelligent Information Technology and Security Informatics (IITSI), IEEE Publication.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

A Hybrid Clustering with Side Information in Text Mining

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: