A Consolidate Review of the recent challenges in Clustering Uniform Effect

Nagul Shaik*, R. Kiran Kumar**
* Research Scholar, Department of Computer Science, Krishna University, Machilipatnam, India.
** Senior Assistant Professor, Department of Computer Science, Krishna University, Machilipatnam, India.
Periodicity:September - November'2017
DOI : https://doi.org/10.26634/jit.6.4.13848

Abstract

The applications of data mining in the recent decade are increased exponentially. The devices through which data is collected and analyzed are different from the previous data collected. One of the data emerged in the recent years for knowledge discovery is class imbalance data. Class imbalance data can be defined as the data with extreme imbalance in the ratio of the class instances. In this paper, the authors have presented different scenarios of clustering algorithms for tackling such type of data; especially k-means algorithm towards class imbalance data. The survey provided a shortcoming of the k-means algorithm towards class imbalance data known as 'uniform effect'. The different causes and reasons for such behavior are analyzed with different benchmark imbalance data with different evaluation criterias.

Keywords

Clustering, Class Imbalance, Uniform Effect, Learning Strategies

How to Cite this Article?

Shaik, N., and Kumar, K. B. (2017). A Consolidate Review of the recent challenges in Clustering Uniform Effect. i-manager’s Journal on Information Technology, 6(4), 30-37. https://doi.org/10.26634/jit.6.4.13848

References

[1]. Agrawal, R., Ghosh, S., Imielinski, T., Iyer, B., & Swami, A. (1992, August). An interval classifier for database mining applications. In Proc. of the VLDB Conference (pp. 560-573).
[2]. Agrawal, R., Imieliński, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In ACM Sigmod Record (Vol. 22, No. 2, pp. 207-216). ACM.
[3]. Bachem, O., Lucic, M., Hassani, S. H., & Krause, A. (2017, July). Uniform deviation bounds for k-means clustering. In International Conference on Machine Learning (pp. 283-291).
[4]. Bradley, P. S., Bennett, K. P., & Demiriz, A. (2000). Constrained k-means clustering. Microsoft Research, Redmond, 1-8.
[5]. Brodinová, Š., Zaharieva, M., Filzmoser, P., Ortner, T., & Breiteneder, C. (2017). Clustering of imbalanced high-dimensional media data. Advances in Data Analysis and Classification, 1-24.
[6]. Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6(1), 1-6.
[7]. Deosarkar, B. P., Yadav, N. S., & Yadav, R. P. (2009, December). A particle swarm approach for uniform cluster distribution in data centric wireless sensor networks. In Nature & Biologically Inspired Computing, 2009. NaBIC 2009. World Congress on (pp. 766-771). IEEE.
[8]. Gates, A. J. & Ahn, Y. Y. (2017). The impact of random models on clustering similarity. The Journal of Machine Learning Research, 18(1), 3049-3076.
[9]. Han, J., Cai, Y., & Cercone, N. (1992, August). Knowledge discovery in databases: An attribute-oriented approach. In VLDB (Vol. 92, pp. 24-27).
[10]. Haut, J. M., Paoletti, M., Plaza, J., & Plaza, A. (2017). Cloud implementation of the K-means algorithm for hyperspectral image analysis. The Journal of Supercomputing, 73(1), 514-529.
[11]. Iwasaki, Y., Kusne, A. G., & Takeuchi, I. (2017). Comparison of dissimilarity measures for cluster analysis of X-ray diffraction data from combinatorial libraries. NPJ Computational Materials, 3(1), 4.
[12]. Japkowicz, N. (2000, June). The class imbalance problem: Significance and strategies. In Proc. of the Int'l Conf. on Artificial Intelligence (pp.111-117).
[13]. Japkowicz, N. (2003, August). Class imbalances: are we focusing on the right issue. In Workshop on Learning from Imbalanced Data Sets II (Vol. 1723, p. 63).
[14]. Jo, T. & Japkowicz, N. (2004). Class imbalances versus small disjuncts. ACM SIGKDD Explorations Newsletter, 6(1), 40-49.
[15]. Kakushadze, Z. & Yu, W. (2017). *K-means and cluster models for cancer signatures. Biomolecular Detection and Quantification, 13, 7-31.
[16]. Keim, D. A., Kriegel, H., & Seidl, T. (1994, February). Supporting data mining of large databases by visual feedback queries. In Data Engineering, 1994. Proceedings 10th International Conference (pp. 302- 313). IEEE.
[17]. LaRiviere, J., Wichman, C. J., & Cunningham, B. (2017). Using k-means clustering to estimate heterogeneous treatment effects: An application to water infrastructure failure.
[18]. Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010, December). Understanding of internal clustering validation measures. In Data Mining (ICDM), 2010 IEEE 10th International Conference on (pp. 911-916). IEEE.
[19]. Lu, W., Han, J., & Ooi, B. C. (1993, June). Discovery of general knowledge in large spatial databases. In Proc. Far East Workshop on Geographic Information Systems, Singapore (pp. 275-289).
[20]. Newman, A. A. D. (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science, Irvine, CA: Univ. of California).
[21]. Piateski, G. & Frawley, W. (1991). Knowledge Discovery in Databases. MIT Press.
[22]. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.
[23]. Suzdaleva, E., Nagy, I., Pecherková, P., & Likhonina, R. (2017). Initialization of Recursive Mixture-based Clustering with Uniform Components. In 14th International Conference on Informatics in Control, Automation and Robotics (pp. 449-458). 10.5220/0006417104490458.
[24]. Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7- 19.
[25]. Wu, J., Brubaker, S. C., Mullin, M. D., & Rehg, J. M. (2008). Fast asymmetric learning for cascade face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(3), 369-382.
[26]. Wu, J., Chen, J., Xiong, H., & Xie, M. (2009). External validation measures for K-means clustering: A data distribution perspective. Expert Systems with Applications, 36(3), 6050-6061.
[27]. Wu, J., Liu, H., Xiong, H., Cao, J., & Chen, J. (2015). K-means- based consensus clustering: A unified view. IEEE Transactions on Knowledge and Data Engineering, 27(1), 155-169.
[28]. Xie, C. (2017). Increase the Performance of K-Means Clustering Algorithm using Apache Spark. International Journal of Internet of Things and its Applications, 1, 13-28.
[29]. Xiong, H., Steinbach, M., Ruslim, A., & Kumar, V. (2009). Characterizing pattern preserving clustering. Knowledge and Information Systems, 19(3), 311-336.
[30]. Zhou, K. & Yang, S. (2016). Exploring the uniform effect of FCM clustering: A data distribution perspective. Knowledge-Based Systems, 96, 76-83.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.