i-manager Publications

An era of Enhanced Subspace Clustering in High-Dimensional data

J. Rama Devi*, M. Venkateswara Rao**

* Research Scholar, Department of Computer Science and Engineering, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India.

** Professor, Department of Information Technology, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India.

Periodicity:September - November'2016
DOI : https://doi.org/10.26634/jcom.4.3.8289

Abstract

In many real world problems, data are collected in high dimensional space. Detecting clusters in high dimensional spaces are a challenging task in the data mining problem. Subspace clustering is an emerging method, which instead of finding clusters in the full space, it finds clusters in different subspace of the original space. Subspace clustering has been successfully applied in various domains. Recently, the proliferation of high-dimensional data and the need for quality clustering results have moved the research era to enhanced subspace clustering, which targets on problems that cannot be handled or solved effectively through traditional subspace clustering. These enhanced clustering techniques involves in handling the complex data and improving clustering results in various domains like social networking, biology, astronomy and computer vision. The authors have reviewed on the enhanced subspace clustering paradigms and their properties. Mainly they have discussed three main problems of enhanced subspace clustering, first: overlapping clusters mined by significant subspace clusters. Second: overcome the parameter sensitivity problems of the state-of-the-art subspace clustering algorithms. Third: incorporate the constraints or domain knowledge that can make to improve the quality of clusters. They also discuss the basic subspace clustering, the relevant high-dimensional clustering approaches, and describes how they are related.

Keywords

Subspace Clustering, High-dimensional Data, Enhanced Subspace Clustering.

How to Cite this Article?

Devi, J.R., and Rao, M.V. (2016). An Era of Enhanced Subspace Clustering in High-Dimensional Data. i-manager’s Journal on Computer Science, 4(3), 28-36. https://doi.org/10.26634/jcom.4.3.8289

References

[1]. Agrawal R, Gehrke J, Gunopulos D, and Raghavan P, (1998). “Automatic subspace clustering of highdimensional data for data mining applications”. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp.94-105.

[2]. Aggarwal CC, and Reddy CK, (2013). “Data clustering: algorithms and applications”. Data Mining Knowledge and Discovery Series 1^st Eds. CRC Press.

[3]. Berkhin P, (2006). “A survey of clustering data mining techniques”. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping Multidimensional Data, chap 2. Springer, New York, pp.25-71.

[4]. Cheng CH, Fu AW, and Zhang Y, (1999). “Entropybased subspace clustering for mining numerical data”. In: Proceedings of the 5^th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 84- 93.

[5]. Dash M, Choi K, Scheuermann P, and Liu H, (2002). “Feature selection for clustering-a filter solution. In: Proceedings of the 2^nd IEEE International Conference on Data Mining (ICDM), pp. 115-122.

[6]. Fromont É, Prado A, and Robardet C, (2009). “Constraint-based subspace clustering”. In Proceedings of the 9^th SIAM International Conference on Data Mining (SDM), pp.26-37.

[7]. Günnemann S, Müller E, Färber I, and Seidl T, (2009). “Detection of orthogonal concepts in subspaces of high dimensional data”. In: Proceedings of the 18^th ACM Conference on Information and Knowledge Management (CIKM), pp.1317-1326.

[8]. Houle ME, Kriegel HP, Kröger P, Schubert E, and Zimek A, (2010). Can shared-neighbor distances defeat the curse of dimensionality?. In: Proceedings of the 22^nd International Conference on Scientific and Statistical Database Management (SSDBM).

[9]. Sim K, Poernomo AK, and Gopalkrishnan V, (2010). “Mining actionable subspace clusters in sequential data”. In: Proceedings of the 10^th SIAM International Conference on Data Mining (SDM), pp.442-453.

[10]. Han J, Pei J, and Kamber M, (2011). Data Mining: Concepts and Techniques. Elsevier.

[11]. Ji L, Tan KL, and Tung AKH, (2006). Mining frequent closed cubes in 3D datasets. In Proceedings of the 32^nd International Conference on Very Large Databases (VLDB), pp.811-822.

[12]. Jang W, Jang W, and Hendry M, (2007). “Cluster analysis of massive datasets in astronomy”. Stat Comput, Vol. 17, No. 3, pp. 253-262.

[13]. Kleinberg J, Papadimitriou C, and Raghavan P, (1998). “A microeconomic view of data mining”. Data Mining Knowl Discov, Vol.2, No.4, pp.311-324.

[14]. Kailing, Kailing K, Kriegel HP, Kröger P, and Wanka S, (2003). “Ranking interesting subspaces for clustering high dimen-sional data”. In: Proceedings of the 7^th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp.241-252.

[15]. Kailing, Kailing K, Kröger P, and Kriegel HP, (2004). “Density-connected subspace clustering for high dimensional data”. In: Proceedings of the 4^th SIAM International Conference on Data Mining (SDM), pp.246- 257.

[16]. Kriegel HP, Kroger P, Renz M, and Wurst S, (2005). “A generic framework for efficient subspace clustering of high-dimensional data”. In fifth IEEE International Conference on Data Mining (ICDM'05), pp.1-8.

[17]. Kriegel, Kriegel HP, Borgwardt KM, Kröger P, Pryakhin A, Schubert M, and Zimek A, (2007). “Future trends in data mining”. Data Mining Knowl Discov, Vol. 15, No. 1, pp. 87- 97.

[18]. K. Sim, Sim K, Gopalkrishnan V, Zimek A, and Cong G, (2012). “A survey on enhanced subspace clustering”. Data Min Knowl Disc, Vol.26, No.2, pp.332-397.

[19]. Lance Parsons, Parsons L, Haque E, and Liu H, (2004). “Subspace clustering for high dimensional data: A review”. ACM SIGKDD Explor Newsl, Vol.6, No.1, pp.90- 105.

[20]. Li T, Ma S, and Ogihara M, (2004). “Document clustering via adaptive subspace iteration”. In: Proceedings of the 27^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, USA. pp.218-225.

[21]. Liu G, Sim K, Li J, and Wong L, (2009). “Efficient mining of distance-based subspace clusters”. Stat Anal Data Mining, (00975-8887), Vol.2, No.5-6, pp.427-444.

[22]. Müller E, Assent I, Günnemann S, Krieger R, and Seidl T, (2009). “Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data”. In: Proceedings of the 9^th IEEE International Conference on Data Mining (ICDM), pp.377-386.

[23]. Müller E, Assent I, Günnemann S, and Seidl T, (2011). “Scalable density-based subspace clustering”. In tProceedings of the 20^th ACM International Conference on Information and Knowledge Management, pp.1077-86, ACM.

[24]. Sim K, Li J, Gopalkrishnan V, and Liu G, (2006). “Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment”. In: Proceedings of tthe 6^th IEEE International Conference on Data Mining (ICDM), pp.1059-1063.

[25]. Sunita Jahirabadkar, and Parag Kulkarni, (2013). “Clustering for High Dimensional Data: Density based subspace Clustering Algorithms”. International Journal of Computer Applications, Vol.63, No.20, pp.00975-8887.

[26]. Vidal R, Tron R, and Hartley R, (2008). “Multiframe motion segmentation with missing data using Power Factorization and GPCA”. Int J Comput Vis, Vol.79, No.1, pp.85-105.

[27]. Vidal R, (2011). “Subspace Clustering”. IEEE Signal Proc Mag, Vol.28, No.2, pp.52-68.

[28]. Wagstaff K, Cardie C, Rogers S, and Schrödl S, (2001). “Constrained k-means clustering with background knowl-edge”. In: Proceedings of the 18^th International Conference on Machine Learning (ICML), pp.577-584.

[29]. Zhang T, Ramakrishnan R, and Livny M, (1996). “BIRCH: An efficient data clustering method for very large databases”. In: Proc. of the ACM SIGMOD International Conference on Management of Data, Vol.1, ACM Press, USA. pp.103-114.

[30]. Zaki MJ, Peters M, Assent I, and Seidl T, (2005). “CLICKS: An effective algorithm for mining subspace clus ters in categorical datasets”. In: Proceedings of the 11^th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp.736-742.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

An era of Enhanced Subspace Clustering in High-Dimensional data

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: