i-manager Publications

A Clustering - Based Attribute Selection Approach for High Dimensional Data

Ravi P. Patki*

Assistant Professor, Department of Information Technology, International Institute of Information Technology, Pune, India.

Periodicity:June - August'2017
DOI : https://doi.org/10.26634/jit.6.3.13775

Abstract

Attribute selection is the procedure of selecting a subset of important attributes for utilization in model development. The central supposition when utilizing as attribute selection method is that the information contains numerous redundant or irrelevant attributes. Repetitive attributes are those which give no more data than the right now chosen attributes, and irrelevant attributes give no helpful data in any setting. Attribute selection is a process in which subset of important attribute is selected which produce good results. Attribute selection algorithm is used for that purpose which achieve efficiency, i.e. less time and correctness of subset. Existing system proposed clustering based attribute selection algorithm based on efficiency and effectiveness criteria. In this algorithm, attributes are first separated into different clusters using graph theoretic clustering method and then those attributes are selected from each clusters, which is most related to target class. Because of large attributes minimal in graph many nodes are generated and in such situations working of prims algorithm is better. In this paper, the system uses Kruskals algorithm instead of Prims algorithm for better efficiency and accuracy. The Kruskals algorithm perform sorting according to the weight and starts from the smallest one which will take less time to iterate. This is the only method which uses sorting technique which will increase the efficiency.

Keywords

Attribute Subset Selection, Attribute Clustering, Data Mining, Filter Method, Graph-Based Clustering.

How to Cite this Article?

Patki, R. P. (2017). A Clustering - Based Attribute Selection Approach for High Dimensional Data. i-manager’s Journal on Information Technology, 6(3), 8-14. https://doi.org/10.26634/jit.6.3.13775

References

[1]. Arauzo-Azofra, A., Benitez, J. M., & Castro, J. L. (2008). Consistency measures for feature selection. Journal of Intelligent Information Systems, 30(3), 273-292.

[2]. Baker, L. D., & McCallum, A. K. (1998, August). Distributional clustering of words for text classification. In Proceedings of the 21^st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 96-103). ACM.

[3]. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1-30.

[4]. Dhillon, I. S., Mallela, S., & Kumar, R. (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3, 1265-1287.

[5]. Gayathri, C. (2014). Feature subset selection using filtering with Mutual information and Maximal information coefficient. International Journal of Innovative Research in Computer and Communication Engineering, 2(1), 1350-1354.

[6]. Jaromczyk, J. W., & Toussaint, G. T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502-1517.

[7]. Kumar, M. M. S., & ME, M. V. L. J. (2014). A Fast Clustering Based Feature Subset Selection using Affinity Propagation Algorithm. International Journal of Innovative Research in Computer and Communication Engineering (Vol. 2).

[8]. Pereira, F., Tishby, N., & Lee, L. (1993, June). Distributional clustering of English words. In Proceedings of the 31^st Annual Meeting on Association for Computational Linguistics (pp. 183-190). Association for Computational Linguistics.

[9]. Vathana, T. J. P., Saravanabhavan, C., & Vellingiri, D. J. (2013). A Survey on Feature Selection Algorithm for High Dimensional Data using Fuzzy Logic. International Journal of Engineering and Science, 2(10), 27-38.

[10]. Verleysen, M. (2003). Learning high-dimensional data. Nato Science Series Sub Series III Computer and Systems Sciences, 186, 141-162.

[11]. Wang, W., & Yang, J. (2009). Mining high-dimensional data. In Data Mining and Knowledge Discovery Handbook (pp. 803-808). Springer US.

[12]. Yu, L., & Liu, H. (2003). Feature selection for highdimensional data: A fast correlation-based filter solution. In Proceedings of the 20^thInternational Conference on Machine Learning (ICML-03) (pp. 856-863).

A Clustering - Based Attribute Selection Approach for High Dimensional Data

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content:

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25