Analyzing Software Defect Prediction Using K-Means and Expectation Maximization Clustering Algorithm Based on Genetic Feature Selection

R. Reena*, R. Thirumalaiselvi**
* Research Scholar, Department of Computer Science, Government Arts College, Nandanam, Chennai, India.
** Assistant Professor, Department of Computer Science, Government Arts College, Nandanam, Chennai, India.
Periodicity:July - September'2016
DOI : https://doi.org/10.26634/jse.11.1.8194

Abstract

The prediction software defect components are an economically important activity and so has received a good deal of attention. However, making sense of the many, and sometimes seemingly inconsistent, a result is difficult. To improve the performance of software defect prediction, this research proposed the mixture of genetic algorithm and bagging technique. The thesis contains two phase. The first phase is feature selection; the features are selected using genetic algorithm, the bagging technique is used for class imbalance problem. The second phase is defect prediction; Software defects are predicted using K-Means and an Expectation Maximization (EM) algorithm. K-Means is a simple and popular approach that is widely used to cluster/classify data. EM algorithm is known to be an appropriate optimization for finding compact clusters. EM guarantees elegant convergence. EM algorithm assigns an object to a cluster according to a weight representing the probability of membership. The proposed method is evaluated using the data sets from NASA metric data repository. The proposed method is evaluated based on evaluation measurement such as accuracy and error rate. The experimental results demonstrate that our approach outperforms other competing approaches.

Keywords

Software Defect Prediction, Genetic Algorithm, Feature Selection, Bagging Technique, k Means and Expectation Maximization Clustering Algorithm

How to Cite this Article?

Reena, R., and Selvi, R., (2016). Analyzing Software Defect Prediction Using K-Means and Expectation Maximization Clustering Algorithm Based On Genetic Feature Selection. i-manager’s Journal on Software Engineering, 11(1), 28-36. https://doi.org/10.26634/jse.11.1.8194

References

[1]. L. Breiman, (1996). “Bagging Predictors”. Machine Learning, Vol. 24, No. 2, pp. 123-140.
[2]. G. Denaro, (2000). “Estimating Software Faultproneness for Tuning Testing Activities”. Proceedings of the 22nd International Conference on Software Engineering (ICSE '00), pp. 704-706.
[3]. T. M. Khoshgoftaar and N. Seliya, (2002). “Tree-based software quality estimation models for fault prediction”. Proceedings Eighth IEEE Symposium on Software Metrics, pp. 203-214.
[4]. Q. Wang and B. Yu, (2004). “Extract Rules from Software Quality Prediction Model Based on Neural Network”. 16th IEEE International Conference on Tools with Artificial Intelligence, pp. 191-195.
[5]. T. Menzies, J. Greenwald, and A. Frank, (2007). “Data Mining Static Code Attributes to Learn Defect Predictors”. IEEE Transactions on Software Engineering, Vol. 33, No. 1, pp. 2-13.
[6]. C. Jones, (2008). Applied Software Measurement. McGraw Hill.
[7]. S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, (2008). “Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings”. IEEE Transactions on Software Engineering, Vol. 34, No. 4, pp. 485-496.
[8]. T. M. Khoshgoftaar and K. Gao, (2009). “Feature Selection with Imbalanced Data for Software Defect Prediction”. International Conference on Machine Learning and Applications, pp. 235-240.
[9]. S. C. Yusta, (2009). “Different metaheuristic strategies to solve the feature selection problem”. Pattern Recognition Letters, Vol. 30, No. 5, pp. 525-534.
[10]. T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, (2011). “Comparing Boosting and Bagging Techniques with Noisy and Imbalanced Data”. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, Vol. 41, No. 3, pp. 552-568.
[11]. M. M. Kabir, M. Shahjahan, and K. Murase, (2012). “A New Hybrid Ant Colony Optimization Algorithm for Feature Selection”. Expert Systems with Applications, Vol. 39, No. 3, pp. 3747-3763.
[12]. B. J. Park, S. K. Oh, and W. Pedrycz, (2013). “The Design of Polynomial Function-based Neural Network Predictors for Detection of Software Defects”. Information Sciences, Vol. 229, pp. 40-57.
[13]. P. J. Kaur and Pallavi, (2013). “Data Mining Techniques for Software Defect Prediction”. International Journal of Software and Web Sciences, Vol. 3, No. 1, pp. 54-57.
[14]. P. A. Selvaraj and P. Thangaraj, (2013). “Support Vector Machine for Software Defect Prediction”. International Journal of Engineering & Technology Research, Vol. 1, No. 2, pp. 68-76.
[15]. Romi Satria Wahono and Wanna Suryana Herman, (2014). “Genetic Feature Selection for Software Defect Prediction”. American Scientific Publishers, Vol. 20, pp. 239-244.
[16]. Aditi Puri and Harshpreet Singh, (2014). “Genetic Algorithm Based Approach For Finding Faulty Modules in Open Source Software Systems”. International Journal of Computer Science & Engineering, Survey (IJCSES), Vol. 5, No. 3.
[17]. Swapna M. Patil and R.V. Argiddi, (2014). “Comparison Between Quad Tree Based K-means and EM Algorithm for Fault Prediction”. (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 5, No. 6, pp. 7984-7988.
[18]. Sonu Kumar Kushwaha and Sunil Malviya, (2016). “A Study on Software Defect Prediction by Data Mining Techniques”. IJMERT, ISSN 2394-6172(0), Vol. 3, No. 5.
If you have access to this article please login to view the article or kindly login to purchase the article

Purchase Instant Access

Single Article

North Americas,UK,
Middle East,Europe
India Rest of world
USD EUR INR USD-ROW
Online 15 15

Options for accessing this content:
  • If you would like institutional access to this content, please recommend the title to your librarian.
    Library Recommendation Form
  • If you already have i-manager's user account: Login above and proceed to purchase the article.
  • New Users: Please register, then proceed to purchase the article.