Big Data Cluster Tendency Techniques with Spectral Features for Efficient Data Partitions Assessment

Rajasekhar Pinisetty*, Ravindranath V.**
*-** Department of Mathematics, Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh, India.
Periodicity:July - September'2023
DOI : https://doi.org/10.26634/jit.12.3.20146

Abstract

Cluster tendency assessment in big data poses a challenge, particularly for non-compact separated (non-CS) datasets with irregular boundaries. This paper introduces a novel Spectral-Based Visual Technique (SVT) to address this limitation. Determining the similarity features for the data objects is a crucial computation in data clustering. Distance measures such as Euclidean and cosine are widely employed in clustering applications. By pre-determining cluster tendency, the quality of clusters is obtained using the algorithms of Visual Assessment of Cluster Tendency (VAT) and cosine-based VAT (cVAT). Both VAT and cVAT utilize Euclidean and cosine distance measures to identify the similarity features of objects. For extensive data cluster tendency assessment, an extended concept of VAT, Clustering using Improved Visual Assessment of Tendency (ClusiVAT), is employed to derive clusters with scalable amounts of time and memory loads. However, it operates efficiently for Compactly Separated (CS) datasets. The research gap lies in the need to deliver the quality of big data partitions (or clusters) for non-compact separated (non-CS) datasets. Thus, this paper proposes a spectral-based visual cluster tendency technique to address the challenge of significant data clustering for non-CS datasets. Experimental analysis employs benchmarked datasets to illustrate the performance of the proposed work compared to other techniques.

Keywords

Similarity Measures, Cluster Tendency, Visual Techniques, Spectral Features, Distance Measures, Data Partition, Data Clustering.

How to Cite this Article?

Pinisetty, R., and Ravindranath, V. (2023). Big Data Cluster Tendency Techniques with Spectral Features for Efficient Data Partitions Assessment. i-manager’s Journal on Information Technology, 12(3), 20-31. https://doi.org/10.26634/jit.12.3.20146

References

[6]. Bezdek, J. C., Keller, J., Krisnapuram, R., & Pal, N. (1999). Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, USA.
[12]. Kaufman, L., & Rousseeuw, P. J. (2009). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New Jersey.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.