An era of Enhanced Subspace Clustering in High-Dimensional data

0*, M. Venkateswara Rao**
* Research Scholar, Department of Computer Science and Engineering, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India.
** Professor, Department of Information Technology, GITAM Institute of Technology, Visakhapatnam, Andhra Pradesh, India.
Periodicity:September - November'2016
DOI : https://doi.org/10.26634/jcom.4.3.8289

Abstract

In many real world problems, data are collected in high dimensional space. Detecting clusters in high dimensional spaces are a challenging task in the data mining problem. Subspace clustering is an emerging method, which instead of finding clusters in the full space, it finds clusters in different subspace of the original space. Subspace clustering has been successfully applied in various domains. Recently, the proliferation of high-dimensional data and the need for quality clustering results have moved the research era to enhanced subspace clustering, which targets on problems that cannot be handled or solved effectively through traditional subspace clustering. These enhanced clustering techniques involves in handling the complex data and improving clustering results in various domains like social networking, biology, astronomy and computer vision. The authors have reviewed on the enhanced subspace clustering paradigms and their properties. Mainly they have discussed three main problems of enhanced subspace clustering, first: overlapping clusters mined by significant subspace clusters. Second: overcome the parameter sensitivity problems of the state-of-the-art subspace clustering algorithms. Third: incorporate the constraints or domain knowledge that can make to improve the quality of clusters. They also discuss the basic subspace clustering, the relevant high-dimensional clustering approaches, and describes how they are related.

Keywords

Subspace Clustering, High-dimensional Data, Enhanced Subspace Clustering.

How to Cite this Article?

Devi, J.R., and Rao, M.V. (2016). An Era of Enhanced Subspace Clustering in High-Dimensional Data. i-manager’s Journal on Computer Science, 4(3), 28-36. https://doi.org/10.26634/jcom.4.3.8289

References

[1]. Agrawal R, Gehrke J, Gunopulos D, and Raghavan P, (1998). “Automatic subspace clustering of highdimensional data for data mining applications”. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD), pp.94-105.
[2]. Aggarwal CC, and Reddy CK, (2013). “Data clustering: algorithms and applications”. Data Mining Knowledge and Discovery Series 1st Eds. CRC Press.
[3]. Berkhin P, (2006). “A survey of clustering data mining techniques”. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping Multidimensional Data, chap 2. Springer, New York, pp.25-71.
[4]. Cheng CH, Fu AW, and Zhang Y, (1999). “Entropybased subspace clustering for mining numerical data”. In: Proceedings of the 5th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp. 84- 93.
[5]. Dash M, Choi K, Scheuermann P, and Liu H, (2002). “Feature selection for clustering-a filter solution. In: Proceedings of the 2nd IEEE International Conference on Data Mining (ICDM), pp. 115-122.
[6]. Fromont É, Prado A, and Robardet C, (2009). “Constraint-based subspace clustering”. In Proceedings of the 9th SIAM International Conference on Data Mining (SDM), pp.26-37.
[7]. Günnemann S, Müller E, Färber I, and Seidl T, (2009). “Detection of orthogonal concepts in subspaces of high dimensional data”. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), pp.1317-1326.
[8]. Houle ME, Kriegel HP, Kröger P, Schubert E, and Zimek A, (2010). Can shared-neighbor distances defeat the curse of dimensionality?. In: Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM).
[9]. Sim K, Poernomo AK, and Gopalkrishnan V, (2010). “Mining actionable subspace clusters in sequential data”. In: Proceedings of the 10th SIAM International Conference on Data Mining (SDM), pp.442-453.
[10]. Han J, Pei J, and Kamber M, (2011). Data Mining: Concepts and Techniques. Elsevier.
[11]. Ji L, Tan KL, and Tung AKH, (2006). Mining frequent closed cubes in 3D datasets. In Proceedings of the 32nd International Conference on Very Large Databases (VLDB), pp.811-822.
[12]. Jang W, Jang W, and Hendry M, (2007). “Cluster analysis of massive datasets in astronomy”. Stat Comput, Vol. 17, No. 3, pp. 253-262.
[13]. Kleinberg J, Papadimitriou C, and Raghavan P, (1998). “A microeconomic view of data mining”. Data Mining Knowl Discov, Vol.2, No.4, pp.311-324.
[14]. Kailing, Kailing K, Kriegel HP, Kröger P, and Wanka S, (2003). “Ranking interesting subspaces for clustering high dimen-sional data”. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp.241-252.
[15]. Kailing, Kailing K, Kröger P, and Kriegel HP, (2004). “Density-connected subspace clustering for high dimensional data”. In: Proceedings of the 4th SIAM International Conference on Data Mining (SDM), pp.246- 257.
[16]. Kriegel HP, Kroger P, Renz M, and Wurst S, (2005). “A generic framework for efficient subspace clustering of high-dimensional data”. In fifth IEEE International Conference on Data Mining (ICDM'05), pp.1-8.
[17]. Kriegel, Kriegel HP, Borgwardt KM, Kröger P, Pryakhin A, Schubert M, and Zimek A, (2007). “Future trends in data mining”. Data Mining Knowl Discov, Vol. 15, No. 1, pp. 87- 97.
[18]. K. Sim, Sim K, Gopalkrishnan V, Zimek A, and Cong G, (2012). “A survey on enhanced subspace clustering”. Data Min Knowl Disc, Vol.26, No.2, pp.332-397.
[19]. Lance Parsons, Parsons L, Haque E, and Liu H, (2004). “Subspace clustering for high dimensional data: A review”. ACM SIGKDD Explor Newsl, Vol.6, No.1, pp.90- 105.
[20]. Li T, Ma S, and Ogihara M, (2004). “Document clustering via adaptive subspace iteration”. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, USA. pp.218-225.
[21]. Liu G, Sim K, Li J, and Wong L, (2009). “Efficient mining of distance-based subspace clusters”. Stat Anal Data Mining, (00975-8887), Vol.2, No.5-6, pp.427-444.
[22]. Müller E, Assent I, Günnemann S, Krieger R, and Seidl T, (2009). “Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data”. In: Proceedings of the 9th IEEE International Conference on Data Mining (ICDM), pp.377-386.
[23]. Müller E, Assent I, Günnemann S, and Seidl T, (2011). “Scalable density-based subspace clustering”. In tProceedings of the 20th ACM International Conference on Information and Knowledge Management, pp.1077-86, ACM.
[24]. Sim K, Li J, Gopalkrishnan V, and Liu G, (2006). “Mining maximal quasi-bicliques to co-cluster stocks and financial ratios for value investment”. In: Proceedings of tthe 6th IEEE International Conference on Data Mining (ICDM), pp.1059-1063.
[25]. Sunita Jahirabadkar, and Parag Kulkarni, (2013). “Clustering for High Dimensional Data: Density based subspace Clustering Algorithms”. International Journal of Computer Applications, Vol.63, No.20, pp.00975-8887.
[26]. Vidal R, Tron R, and Hartley R, (2008). “Multiframe motion segmentation with missing data using Power Factorization and GPCA”. Int J Comput Vis, Vol.79, No.1, pp.85-105.
[27]. Vidal R, (2011). “Subspace Clustering”. IEEE Signal Proc Mag, Vol.28, No.2, pp.52-68.
[28]. Wagstaff K, Cardie C, Rogers S, and Schrödl S, (2001). “Constrained k-means clustering with background knowl-edge”. In: Proceedings of the 18th International Conference on Machine Learning (ICML), pp.577-584.
[29]. Zhang T, Ramakrishnan R, and Livny M, (1996). “BIRCH: An efficient data clustering method for very large databases”. In: Proc. of the ACM SIGMOD International Conference on Management of Data, Vol.1, ACM Press, USA. pp.103-114.
[30]. Zaki MJ, Peters M, Assent I, and Seidl T, (2005). “CLICKS: An effective algorithm for mining subspace clus ters in categorical datasets”. In: Proceedings of the 11th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pp.736-742.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.