A Hybrid Clustering with Side Information in Text Mining

T. Naveen Kumar*, Ramadevi**
* PG Scholar, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
** Assistant Professor, Department of Computer Science and Engineering, Sree Vidyanikethan Engineering College, Tirupati, India.
Periodicity:June - August'2016
DOI : https://doi.org/10.26634/jcom.4.2.8122

Abstract

In many online forms, lots of side-data or meta information is available. This Meta data consists of different kinds, for example the links present in the file, the user-access performance from blogs, the document origin information, and also other attributes which are surrounded into the content or text document. For the clustering purposes, these Meta attributes contain large amount of information. The Meta data adds the noise to the mining process. So, it is difficult to incorporate into this process. The existing COATES algorithm is created for clustering approach. But, in COATES the kmeans algorithm creates some problems as it is unable to get the quality of clusters better. Because, it leads to the wrong number of clusters, different sized clusters, and empty clusters and outliers. The authors have proposed a Hybrid-COATES algorithm which combines CURE with COATES algorithm for an efficiency and effective clustering approach. To mine text data with the help of Meta data or side information, CURE algorithm is more capable than kmeans algorithm. Hybrid-COATES method is used to attempt to the scalability problem and improve the quality of clustering results.

Keywords

Data Mining, Clustering, COATES Algorihm, K-means Algorithm, Hybrid-COATES.

How to Cite this Article?

Kumar, T.N., and Ramadevi (2016). A Hybrid Clustering With Side Information In Text Mining. i-manager’s Journal on Computer Science, 4(2), 23-30. https://doi.org/10.26634/jcom.4.2.8122

References

[1]. Charu C. Aggarwal, Yuchen Zhao, and Philip S. Yu, (2014). “On the Use of Side Information for Mining Text Data”. IEEE Transactions on Knowledge and Engineering, Vol.26, No.6, pp.1415-1429.
[2]. Jain, M. Murty, and P. Flynn, (1998). “Data Clustering: A Review”. ACM Comput. Survey, Vol.31, No.3, pp.264-323.
[3]. Guha, R. Rastogi, and K. Shim, (1998). “CURE: An Efficient Clustering Algorithm for Large Databases”. Proceeding of ACM Int'l Conference on Management of Data, pp.73-84.
[4]. M. Livny, R. Ramakrishna and T. Zhang, (1996). “BIRCH: An Efficient Clustering Method for Very Large Databases”. Proceeding ACMSIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, pp.103-114.
[5]. Danyang Cao, and Bingru Yang, (2010). “An nd improved k-medoids Clustering Algorithm”. The 2 International Conference on Computer and Automation Engineering (ICCAE).
[6]. Shi Na, Liu Xumin and Guan Yong, (2010). “Research on k- means Clustering Algorithm: An Improved k- means Clustering Algorithm”. Intelligent Information Technology and Security Informatics (IITSI), IEEE Publication.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Pdf 35 35 200 20
Online 35 35 200 15
Pdf & Online 35 35 400 25

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.