i-manager Publications

Big Data Cluster Tendency Techniques with Spectral Features for Efficient Data Partitions Assessment

Rajasekhar Pinisetty*, Ravindranath V.**

*-** Department of Mathematics, Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh, India.

Periodicity:July - September'2023
DOI : https://doi.org/10.26634/jit.12.3.20146

Abstract

Cluster tendency assessment in big data poses a challenge, particularly for non-compact separated (non-CS) datasets with irregular boundaries. This paper introduces a novel Spectral-Based Visual Technique (SVT) to address this limitation. Determining the similarity features for the data objects is a crucial computation in data clustering. Distance measures such as Euclidean and cosine are widely employed in clustering applications. By pre-determining cluster tendency, the quality of clusters is obtained using the algorithms of Visual Assessment of Cluster Tendency (VAT) and cosine-based VAT (cVAT). Both VAT and cVAT utilize Euclidean and cosine distance measures to identify the similarity features of objects. For extensive data cluster tendency assessment, an extended concept of VAT, Clustering using Improved Visual Assessment of Tendency (ClusiVAT), is employed to derive clusters with scalable amounts of time and memory loads. However, it operates efficiently for Compactly Separated (CS) datasets. The research gap lies in the need to deliver the quality of big data partitions (or clusters) for non-compact separated (non-CS) datasets. Thus, this paper proposes a spectral-based visual cluster tendency technique to address the challenge of significant data clustering for non-CS datasets. Experimental analysis employs benchmarked datasets to illustrate the performance of the proposed work compared to other techniques.

Keywords

Similarity Measures, Cluster Tendency, Visual Techniques, Spectral Features, Distance Measures, Data Partition, Data Clustering.

How to Cite this Article?

Pinisetty, R., and Ravindranath, V. (2023). Big Data Cluster Tendency Techniques with Spectral Features for Efficient Data Partitions Assessment. i-manager’s Journal on Information Technology, 12(3), 20-31. https://doi.org/10.26634/jit.12.3.20146

References

[1]. Amelio, A., & Pizzuti, C. (2015, August). Is normalized mutual information a fair measure for comparing community detection methods?. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (pp. 1584-1585).

[2]. Asuncion, A., & Newman, D. (2007). UCI Machine Learning Repository.

[3]. Basha, M. S., Mouleeswaran, S. K., & Prasad, K. R. (2022). Detection of pre-cluster nano-tendency through multi-viewpoints cosine-based similarity approach. Nanotechnology for Environmental Engineering, 7(1), 259-268.

[4]. Basha, M. S., Mouleeswaran, S. K., & Prasad, K. P. (2023). Hybrid visual computing models to discover the clusters assessment of high dimensional big data. Soft Computing, 27(7), 4249-4262.

[5]. Bezdek, J. C., & Hathaway, R. J. (2002, May). VAT: A tool for visual assessment of (cluster) tendency. In Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN'02, 3, 2225-2230. IEEE.

[6]. Bezdek, J. C., Keller, J., Krisnapuram, R., & Pal, N. (1999). Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, USA.

[7]. Cazarez, R. L. U. (2022). Accuracy comparison between statistical and computational classifiers applied for predicting student performance in online higher education. Education and Information Technologies, 27(8), 11565-11590.

[8]. De Diego, I. M., Redondo, A. R., Fernández, R. R., Navarro, J., & Moguerza, J. M. (2022). General performance score for classification problems. Applied Intelligence, 52(10), 12049-12063.

[9]. Hanji, S., & Hanji, S. (2023, January). Towards performance overview of mini batch K-Means and KMeans: Case of four-wheeler market segmentation. In International Conference on Smart Trends in Computing and Communications (pp. 801-813). Springer Nature Singapore.

[10]. Havens, T. C., Bezdek, J. C., Leckie, C., Chan, J., Liu, W., Bailey, J., & Palaniswami, M. (2013, July). Clustering and visualization of fuzzy communities in social networks. In 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (pp. 1-7). IEEE.

[11]. Huband, J. M., Bezdek, J. C., & Hathaway, R. J. (2004, June). Revised visual assessment of (cluster) tendency (reVAT). In IEEE Annual Meeting of the Fuzzy Information 2004, 1, 101-104. IEEE.

[12]. Kaufman, L., & Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New Jersey.

[13]. Kumar, D., Palaniswami, M., Rajasegarar, S., Leckie, C., Bezdek, J. C., & Havens, T. C. (2013, October). clusiVAT: A mixed visual/numerical clustering algorithm for big data. In 2013 IEEE International Conference on Big Data (pp. 112-117). IEEE.

[14]. Pattanodom, M., Iam-On, N., & Boongoen, T. (2016, January). Clustering data with the presence of missing values by ensemble approach. In 2016 Second Asian Conference on Defence Technology (ACDT) (pp. 151-156). IEEE.

[15]. Prasad, K. R., & Basha, M. S. (2016, January). Improving the performance of speech clustering method. In 2016 10th International Conference on Intelligent Systems and Control (ISCO) (pp. 1-5). IEEE.

[16]. Rathore, P., Kumar, D., Rajasegarar, S., Palaniswami, M., & Bezdek, J. C. (2019). A scalable framework for trajectory prediction. IEEE Transactions on Intelligent Transportation Systems, 20(10), 3860-3874.

[17]. Sarma, T. H., Viswanath, P., & Reddy, B. E. (2013). Single pass kernel k-means clustering method. Sadhana, 38(3), 407-419.

[18]. Sculley, D. (2010, April). Web-scale k-means clustering. In Proceedings of the 19th International Conference on World Wide Web (pp. 1177-1178).

[19]. Wang, L., Geng, X., Bezdek, J., Leckie, C., & Kotagiri, R. (2008, December). SpecVAT: Enhanced visual cluster analysis. In 2008 Eighth IEEE International Conference on Data Mining (pp. 638-647). IEEE.

[20]. Wang, L., Geng, X., Bezdek, J., Leckie, C., & Kotagiri, R. (2009). Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1401-1414.

[21]. Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3), 645-678.

[22]. Ye, H. M., Yan, S. L., & Bai, X. (2017, April). Application of switching median filter in two-dimensional otsu image segmentation. In 2017 International Conference on Network and Information Systems for Computers (ICNISC) (pp. 258-261). IEEE.

	North Americas,UK, Middle East,Europe		India	Rest of world
	USD	EUR	INR	USD-ROW
Pdf	35	35	200	20
Online	15	15	200	15
Pdf & Online	35	35	400	25

Big Data Cluster Tendency Techniques with Spectral Features for Efficient Data Partitions Assessment

Abstract

Keywords

How to Cite this Article?

References

If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Options for accessing this content: