Software defect prediction is an important research topic in the software engineering field, especially to solve the inefficiency and ineffectiveness of the existing industrial approach of software testing and reviews. The software defect prediction performance decreases significantly because the data set contains noisy attributes and class imbalance. Feature selection is generally used in machine learning when the learning task involves high-dimensional and noisy attribute datasets. In this survey, a Genetic Algorithm and a bagging technique is a research topic for Software Defect Prediction. The survey of publications on this topic leads to the conclusion that the field of genetic algorithms applications is growing fast. The authors overall aim is to provide an efficient feature selection for further development of the research.
In the research area of software defect prediction, there have been significant advances in Genetic Feature Selection for Software Defect Prediction. The authors goal is to develop the combination of genetic algorithm and bagging technique for improving the performance of the software defect prediction.
Feature selection is an important data preprocessing activity and has been extensively studied in the data mining and machine learning community. The main goal of feature selection is to select a subset of features that minimizes the prediction errors of classifiers.
Here, Genetic algorithm is applied to deal with the feature selection, and bagging technique is employed to deal with the class imbalance problem.
The main objectives of this study are given below:
Genetic Algorithm employs on the basis of the following parameters:
Genetic Algorithm is a problem solving algorithm. It uses genetics as its model of problem solving. It is a search technique to find approximate solutions to the optimization and search problems. Genetic algorithm is applied for solving the problem of faulty module prediction and as well as for finding the most important attribute for fault occurrence. This section surveys the approaches of discourse annotation at the Genetic Algorithm. A lot of research work has been carried out in the field of genetic algorithm, and various researchers have proposed several useful techniques for genetic algorithm and feature selection.
This work presented an approach to Worst-Case Execution Time (WCET) test generation. It is executed in two stages. The first stage is to augment the source code of the tested program with counters which store the number of invocations of each method and loop. The second stage is to run a single-objective genetic algorithm, where the Fitness function is determined for each iteration by an objective selection algorithm.
This approach is different from other existing approaches in two aspects. First, it is more automated because it does not require a human to insert the counters into the source code. Second, two new objective selection algorithms, which do not need parameter tuning, are proposed and have shown better results. So they can be recommended as a default option for an implementation of the presented approach in a software tool for test generation [3].
AditiPuri and Harshpreet Singh developed a Genetic Algorithm to find critical classes and metrics that are fault prone. The Genetic Algorithm technique shows the high value of Probability of Detection (PD) i.e. 0.875 and the low value of Probability of False Alarms (PF) i.e. 0.44. The error and accuracy values are calculated and recorded as 0.294 and 0.705 respectively. It is therefore concluded that, the Genetic algorithm can be used for objectoriented systems and is useful in predicting the fault prone classes. The work can be extended by using other evolutionary algorithms for finding the most important attribute for fault prediction and finding the critical classes and metrics [4].
This is an early attempt at the genetic algorithm. The features under study were as follows:
This study focuses on the randomized unit testing with genetic algorithm. This study relied that, the randomized unit testing with genetic algorithm is less time efficient, slows down the speed of processing and also produces less optimized parameters. Randomized unit testing is a promising technology that has been shown to be effective, but whose thoroughness depends on the settings of test algorithm parameters. A number of studies have shown less time efficiency, less optimization process and Test case generation using the randomized unit testing with genetic algorithm.
The optimization techniques for GA methods has tools like, a two-level genetic random testing system-Nighthawk, Pruned GA with FSS tool, FSS Learner into the genetic algorithmic level of Nighthawk, and: improved GA with Nighthawk for RUT are used, but the optimization of the testing process is not optimal and also produces less test case generation. The system does not consider the fitness value for the randomized unit testing, rather it generates the random selection of software units. Due to the iteration process, it becomes too complex and time consuming [5].
This work heavily relied on the genetic algorithm. It presented a genetic algorithm-based approach for XSS detection and removal. Cross-site scripting is a major security problem for web applications. It can lead to account or website hijacking, loss of private information, and denial of service, all of which victimize the site users. This approach is an improvement based on two approaches. The first approach uses genetic algorithms to detect the reflected XSS vulnerabilities only but does not remove them. The second approach is able to detect and remove both the reflected and stored XSS vulnerabilities using the pattern matching technique, but not DOM-based XSS [6].
In software testing, the generation of test data is one of the key steps, which have a great effect on the automation of software testing. Since the manual generation of the test data consumes much of the computational time, the process of Test Data Generation has been automated. Software Testing is also an optimization problem with the objective that the efforts consumed should be minimized. Therefore, the search based optimization techniques- Genetic Algorithm (GA) and Clonal Selection Algorithm (CSA) are used in this work. To generate the suitable data, methods were traversed to cover each (Figures 1 and 2) node. Test data values were selected based on fitness/affinity values of antibodies which satisfy the predicate node. Based on the predicate node condition, both algorithms were applied and the optimal test data was generated. Also, both techniques are compared with random testing to show that the test data generated by the search based techniques are better than the random testing as the number of test data generated for random testing is less optimal than GA, CSA [7].
In this work, a genetic algorithm was developed for prioritizing the test cases on the basis of fault coverage and execution time as input parameters of regression testing. Regression testing is a frequently executed maintenance process used to revalidate the modified software. Regression testing is one of the type of testing which works for finding the new software bugs and regression, where the functional and non functional areas of a existing system changes after the enhancements, configuration changes and patches. The intention of the regression testing is to ensure that the changes that have been made does not include new faults and also needs to identify that the changes impact other parts of the software or not. This work improves the effectiveness of algorithm with the help of Genetic Algorithm (GA). Total fault coverage within the time constrained environment on different examples is used to prioritize the test cases and their finite solution [8].
In this work, A.M Sherry and Manish Saraswat developed a genetic algorithm on the test cases to prioritize their execution during Regression testing of the system or software. They used a fitness function to determine the efficiency of test case (a test case covers more number of modified lines is more efficient) and a test case sequence (or test suite) which has higher fitness value, had higher priority for execution during the testing. On applying genetic algorithm for a large number of time or generations, there is a higher probability for achieving an optimum solution [9].
J. Srividhya and K. Alagarsamy developed a modified genetic algorithm for reducing the cost of the regression testing. In this approach a cross sectional elitist selection is used to obtain the best individuals. When genetic algorithm is employed, a near optimal solution is also obtained. In genetic algorithm, populace of the chromosome is characterized by diverse codes, for example, real number, permutation, binary and so forth. Genetic operators such as selection, mutation and cross over are employed on the chromosome with a specific end goal to discover the fittest chromosome. The fitness of a chromosome is characterized by an objective function. Genetic algorithm includes the following steps:
This study was similar to (Kriti Singh and Paramjeet Kaur (2014)), but with an addition to multi-objective for the regression testing reduction. The multi objective genetic algorithm overcomes the short comes of the genetic algorithm. This work focused on optimization of the regression testing with multi-objective genetic algorithm which covered parameters like, simplicity and complexity for test cases for regression testing. Multi-objective optimization refers to the solution of problems with two or more objectives to be satisfied simultaneously. Often, such objectives are in conflict with each other, and are expressed in different units. Because of their nature, multiobjective optimization problems normally have not one but a set of solutions, which are called Pareto-optimal solutions or non-dominated solutions. When such solutions are represented in the objective function space, the graph produced is called the Pareto-front or the Pareto-optimal set. A general formulation of a multiobjective optimization problem consists of a number of objectives with a number of inequality, and equality constraints [11].
The main objective of this study are,
Evolutionary algorithms are used for automatic test case generation. Genetic algorithm is the most widely used technique for automatic testing. Due to limitation of Single objective genetic algorithm for which only one objective can be considered for evaluation of the test cases, a new technique known as Multi objective Genetic Algorithm was used for the test case generation from the UML sequence diagram [12].
This study focuses on the influence of using initialization functions in genetic algorithms applied to combinatorial optimization problems. In this first stage of the research, the experimentation was conducted with the well-known TSP. This experimentation was carried out with three different heuristic functions. For each operator, the performance of four different GAs are compared. As final conclusion of this research, the efficiency of using heuristic initialization functions are highlighted. Anyway, the excessive use of them could decrease the exploration capacity of the GA, trapping the population in local optimums quickly. Therefore, the key is to maintain a balance between the individuals initialized by functions, and the individuals generated randomly [13].
Chayanika Sharma and Sangeeta Sabharwal developed a Software Testing Technique using Genetic Algorithm. The GA is also used with fuzzy as well as the neural networks in different types of testing. The overall aim of the software industry is to ensure delivery of high quality software to the end user. To ensure high quality software, it is required to test the software. Testing ensures that the software meets user specifications and requirements. However, the field of software testing has a number of underlying issues like effective generation of test cases, prioritisation of test cases, etc, which needs to be tackled. These issues demand on effort, time and cost of the testing. Different techniques and methodologies have been proposed for taking care of these issues. Use of evolutionary algorithms for automatic test generation has been an area of interest for many researchers. It is found that by using GA, the results and the performance of testing can be improved [14].
This study focuses on software test case understanding. A test case can have information that includes the test case name, goal, environment, steps to be taken, input and expected results. Well-designed test cases are the most important tools for discovering defects in software. These tools give you the ability to prevent serious defects, and even minor ones, from being shipped to the customers. Good test cases saves time and money and even may be the reputation of your organization [15].
This study focuses on the Automated Test Case Generation using Nature Inspired Meta Heuristics- Genetic Algorithm. For automation of software testing, the generation of test data is one of the key step and therefore the generation of testing data relates to the quality of the software production indirectly. They have applied the improved genetic algorithm for automatic test case generation with some experiment analysis and have shown in their experiment that, the improved genetic algorithm is superior to the basic genetic algorithm on effectiveness and efficiency of the automatic test case generation [16].
This work has presented a survey on genetic feature selection for software defect prediction. It would improve the performance of the software defect prediction. The authors research plan to develop a Genetic algorithm is applied to deal with the feature selection, and bagging technique is employed to deal with the class imbalance problem. Some systems can be used and reused in different types of genetic algorithm. But the combination of genetic algorithm and bagging technique makes an impressive improvement in prediction performance for most classifiers.