A Clustering - Based Attribute Selection Approach for High Dimensional Data

Ravi P. Patki*
Assistant Professor, Department of Information Technology, International Institute of Information Technology, Pune, India.
Periodicity:June - August'2017
DOI : https://doi.org/10.26634/jit.6.3.13775

Abstract

Attribute selection is the procedure of selecting a subset of important attributes for utilization in model development. The central supposition when utilizing as attribute selection method is that the information contains numerous redundant or irrelevant attributes. Repetitive attributes are those which give no more data than the right now chosen attributes, and irrelevant attributes give no helpful data in any setting. Attribute selection is a process in which subset of important attribute is selected which produce good results. Attribute selection algorithm is used for that purpose which achieve efficiency, i.e. less time and correctness of subset. Existing system proposed clustering based attribute selection algorithm based on efficiency and effectiveness criteria. In this algorithm, attributes are first separated into different clusters using graph theoretic clustering method and then those attributes are selected from each clusters, which is most related to target class. Because of large attributes minimal in graph many nodes are generated and in such situations working of prims algorithm is better. In this paper, the system uses Kruskals algorithm instead of Prims algorithm for better efficiency and accuracy. The Kruskals algorithm perform sorting according to the weight and starts from the smallest one which will take less time to iterate. This is the only method which uses sorting technique which will increase the efficiency.

Keywords

Attribute Subset Selection, Attribute Clustering, Data Mining, Filter Method, Graph-Based Clustering.

How to Cite this Article?

Patki, R. P. (2017). A Clustering - Based Attribute Selection Approach for High Dimensional Data. i-manager’s Journal on Information Technology, 6(3), 8-14. https://doi.org/10.26634/jit.6.3.13775

References

[1]. Arauzo-Azofra, A., Benitez, J. M., & Castro, J. L. (2008). Consistency measures for feature selection. Journal of Intelligent Information Systems, 30(3), 273-292.
[2]. Baker, L. D., & McCallum, A. K. (1998, August). Distributional clustering of words for text classification. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 96-103). ACM.
[3]. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1-30.
[4]. Dhillon, I. S., Mallela, S., & Kumar, R. (2003). A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research, 3, 1265-1287.
[5]. Gayathri, C. (2014). Feature subset selection using filtering with Mutual information and Maximal information coefficient. International Journal of Innovative Research in Computer and Communication Engineering, 2(1), 1350-1354.
[6]. Jaromczyk, J. W., & Toussaint, G. T. (1992). Relative neighborhood graphs and their relatives. Proceedings of the IEEE, 80(9), 1502-1517.
[7]. Kumar, M. M. S., & ME, M. V. L. J. (2014). A Fast Clustering Based Feature Subset Selection using Affinity Propagation Algorithm. International Journal of Innovative Research in Computer and Communication Engineering (Vol. 2).
[8]. Pereira, F., Tishby, N., & Lee, L. (1993, June). Distributional clustering of English words. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics (pp. 183-190). Association for Computational Linguistics.
[9]. Vathana, T. J. P., Saravanabhavan, C., & Vellingiri, D. J. (2013). A Survey on Feature Selection Algorithm for High Dimensional Data using Fuzzy Logic. International Journal of Engineering and Science, 2(10), 27-38.
[10]. Verleysen, M. (2003). Learning high-dimensional data. Nato Science Series Sub Series III Computer and Systems Sciences, 186, 141-162.
[11]. Wang, W., & Yang, J. (2009). Mining high-dimensional data. In Data Mining and Knowledge Discovery Handbook (pp. 803-808). Springer US.
[12]. Yu, L., & Liu, H. (2003). Feature selection for highdimensional data: A fast correlation-based filter solution. In Proceedings of the 20th International Conference on Machine Learning (ICML-03) (pp. 856-863).
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.