A New Adaptive BFO Based On PSO For Learning Neural Network

* PG student, Islamic University of Gaza, Palestine

** Associate Professor, Computer Engineering Department, Islamic University of Gaza, Palestine

Abstract

In this paper, we introduce a new learning algorithm for neural network. A New Adaptive bacteria foraging optimization based on particle swarm optimization(ABFO_PSO) is used in learning neural network. This paper reviews Feed Forward Neural Network (FFANN), and the drawback of back- propagation learning method. Particle Swarm Optimization (PSO) is also described. Moreover, using the PSO in learning neural network is reviewed. Bacterial Foraging Optimization (BFO) is a novel heuristic algorithm inspired from forging behavior of E. coli. It is predominately used to find solutions for real-world problems, but it has problem with time and convergence behaviour. To introduce ABFO_PSO, that provide solution for BFO problem, we make a hybrid between PSO and BFO. Moreover, BFO and ABFO_PSO are applied for learning neural network. The comparison between the results of ABFO_PSO, BFO and PSO for learning neural network shows the strength of new method.

Keywords :

Learning Neural Network,
BFO,
PSO,
Adaptive BFO.

Introduction

Feed-Forward Artificial Neural Network (FFANN) is a common type of neural network used in a wide range of classification problems. Commonly this kind of neural network needs to be trained in order to perform a good classification rate. This training phase usually means the use of the back propagation algorithm. The Back-Propagation network not only has strong mathematical theory, but also can approximate any arbitrary function. However this algorithm has shown some disadvantages related with the computational resources used in some applications. One of the problems is that all of gradient algorithms merely have local search ability. Training procedure for FFANN is a procedure of optimizing weights of neural network. Therefore, we can use global optimization method to overcome disadvantages of BP algorithm. Training neural network using PSO is proposed in many researches (MAN Chun-tao, WANG Kun& ZHANG Li-yong, 2009), (Andrich & Andries, 2010), ( Jun Liu and Xiaohong Qiu, 2009), ( MarcioCarvalho & Teresa, 2006), ( Yong Gui He & Li Bo., 2009). In this paper, BFO is used.

Bacterial Foraging Optimization (BFO) was first proposed by Passino in 2002. It uses the natural foraging strategies of E. coli bacterium cells that take necessary action to maximize the energy utilized per unit time spent for foraging. Recently BFO has attracted much attention from the computational intelligence community and has been applied successfully to a number of engineering problems, such as optimal power flow scheduling harmonic estimation, PID controller design load forecasting, stock market prediction and face recognition etc. However, BFO has the problem of low convergence speed and it usually results in sustained oscillation, especially on flat fitness landscapes, when a bacterium cell is close to the optima. To improve BFO search performance, some variants of BFO was proposed which includes parameters modification ( Ben Niu, Hong Wang, Lijing Tan,& Li Li, 2011),( Xin Xu, Yan-heng Liu, Ai-min Wang, Gang Wang, & Hui-ling Chen, 2012) ( Hanning Chen, Yunlong Zhu & Kunyuan Hu. , 2008) and hybrid algorithms ( Bakwad, Pattnaik, Sohi, Devi, Panigrahi, Sanjoy Das & Lohokare, 2009), ( Kim, Abraham, & Cho, 2007).

In this paper, a new Adaptive BFO based on PSO (ABFO_PSO) for learning neural network is proposed. The comparison between ABFO_PSO, BFO and PSO in learning neural network shows the advantages of this new technique.

ABFO_PSO, PSO and BFO were implemented for the training phase of FFANN. Our benchmark tests were training process from the cancer datasets, which were taken from UCI Machine Learning Repository.

This paper is organized as follows: Section 1, states related work. Section 2, is about the neural network algorithm. Section 3, shows generic PSO. Section 4, describes BFO. In section 5, the researchers propose ABFO_PSO algorithm. Section 6, shows how to use ABFO_PSO, PSO and BFO on BFO in learning neural network. Section 7, describes the experiment. In section 8, the experiment results are shown. Final Section includes the conclusions and discussions for future research.

1. Related Work

To improve neural network learning using optimization algorithms and to improve BFO performance many approaches are proposed.

Comparison of PSO and Differential Evolution (DE) for training neural networks( EspinalAndr´ es, Sotelo-Figueroa Marco, Soria-Alcaraz Jorge, Ornelas Manuel, PugaHectorand & CarpioMart´ın, 2011) shows that PSO works in a better way than DE in the minimization of quadratic error function as fitness function.

A novel PSO based back-propagation learning algorithm (PSO-BP), which combines PSO learning with BP learning, is proposed (Li Ting & Wu Min,2008). Test results show that the proposed algorithm has faster convergence speed and better forecast accuracy than BP algorithm and GA-BP algorithm. Besides, due to parallel characteristics of the proposed algorithm, PSO-BP can be widely used in complex system modeling.

A non-linearly decreasing modulation model to optimize the Chemotaxis step size is proposed to improve the original BFO, including adaptive modulation Index, coefficient and adaptive Chemotaxis step size range(Ben Niu, Hong Wang, Lijing Tan ,& Li Li, 2011).

The fusion of Hybrid Bacterial Foraging with parameter free Particle Swarm Optimization (HBF-pfPSO) is introduced in (Bakwad, Pattnaik,. Sohi, Devi, Panigrahi, Sanjoy Das,&Lohokare,2009).HBF-pfPSO is capable to produce good quality of optima with faster convergence.

Adaptive Bacterial Foraging Optimizer based on Field is proposed in(Xin Xu, Yan-heng Liu, Ai-min Wang, Gang Wang, & Hui-ling Chen,2012). This paper has proposed a novel adaptive scheme which is based on the field, function period and predefined maximum and minimum values, to adapt the chemotactic step size in BFO with the aim of improving its convergence behavior. The ABFOF can avoid the oscillation around the optima or the stagnation near optima for a bacterium cell.

2. Neural Networks

Neural Networks are algorithms for optimization and learning based loosely on concepts inspired by research into the Nature of the brain. They generally consist of five components as shown in Figure (1):

A directed graph known as the network topology whose arcs we refer to as links.
A state variable associated with each node.
A real-valued weight associated with each link.
A real-valued bias associated with each node.
A transfer function for each node which determines the state of anode as a function of a)its bias b , b)the weights, wt of its incoming links , and c)the states, x, of the nodes connected to it by these links. This transfer function usually takes the form ɑ = f (w p + b) which is a sigmoid, a step function or linear function.

Figure 1. Signal Neural Network

Multilayer Feed-Forward Artificial Neural Network (MLFFANN) have three kinds of layers as shown in Figure 2: input layer, hidden layer and output layer. Each perceptron of MLPs has multiple nonlinear processing elements and any element at a given layer feeds all elements at the next layer. Their main advantage is that they can approximate any input/output map and they are widely used for classification.

Figure 2. Multilayer Neural Network

MLFEAN neural networks, trained with a back propagation learning algorithm, are the most popular neural networks. But it has following key disadvantages:(1)because weight modification is based on gradient information, it can get caught into local optimum; (2)as the problem complexity increases due to increased dimensionality and /or Greater.

We need to choose the best optimization technique to determine the best value of the weights and biases which classify the class correctly.

3. Particle Swarm Optimization

Particle swarm optimization is a swarm intelligence technique which was proposed by Kennedy and Eberhartin1995 as in(Kennedy, 1995). The PSO is developed based on the following model as in (Singiresu, 2009):

When one bird locates a target or food (or maximum of the objective function), it instantaneously transmits the information to all other birds.
All other birds gravitate to the target or food (or maximum of the objective function), but not directly.
There is a component of each bird's own independent thinking as well as its past memory.

Thus the model simulates a random search in the design space for the maximum value of the objective function. As such, gradually over much iteration, the birds go to the target (or maximum of the objective function).

The PSO procedure can be implemented through the following steps:

Assume the size of the swarm (number of particles) is N.
Generate the initial population of X randomly as X₁, X₂, . . . , X_N (positions of particle).
Find the velocities of particles. All particles will be moving to the optimal point with a velocity. Initially, all particle velocities are assumed to be zero.
In the i^th iteration, find the following two important parameters used by a typical particle j :

(a) The historical best value of X_j(i) (coordinates of j^th particle in the current iteration i), P_best,j with the highest value of the objective function, f[X_j(i)] encountered by particle j in all the previous iterations.

The historical best value of X_j(i) (coordinates of all particles up to that iteration), G_best, with the highest value of the objective function, f[X_j(i)] encountered in all the previous iterations by any of the N particles.

(b) Find the velocity of particle j in the i^th iteration as follows:

(1)

(2)

Check the convergence of the current solution. If the positions of all particles converge to the same set of values, the method is assumed to have converged. If the convergence criterion is not satisfied, step 4 is repeated by updating the iteration number as i = i + 1, and by computing the new values of P_best,i and G_best, the iterative process is continued until all particles converge to the same optimum solution.

4. Bacterial Foraging Optimization

Like other swarm intelligence algorithms, BFOA is based on social and cooperative behaviors found in nature. In fact, the way Bacteria look for regions of high levels of nutrients can be seen as an optimization process. The bacteria foraging strategy can be explained by four processes as in (Singiresu,2009):

4.1 Chemotaxis

This process simulates the movement of an E.coli cell through swimming and tumbling via flagella. Biologically an E.coli bacterium can move in two different ways. It can swim for a period of time in the same direction or it may tumble, and alternate between these two modes of operation for the entire lifetime. Suppose θⁱ(j,k,l) represents i-th bacterium at j-th chemotactic, k-th reproductive and l-th elimination-dispersal step. C (i) is the size of the step taken in the random direction specified by the tumble (run length unit). Then in computational chemotaxis the movement of the bacterium may be represented by,

(3)

where Δ indicates a vector in the random direction whose elements lie in [-1, 1].

4.2 Swarming (Abd-Elazim, 2010)

For the bacteria to reach at the richest food location, it is desired that the optimum bacterium till a point of time in the search period should try to attract other bacteria so that together they converge at the desired location more rapidly. To achieve this, a penalty function based upon the relative distances of each bacterium from the fittest bacterium till that search duration, is added to the original cost function. Finally, when all the bacteria have merged into the solution point, this penalty function becomes zero. The cell-to-cell signaling in E. coli swarm may be represented by the following function.

4.3 Reproduction

The least healthy bacteria eventually die while each of the healthier bacteria asexually split into two bacteria, which are then placed in the same location. This keeps the swarm size constant.

4.4 Elimination and dispersal

Gradual or sudden changes in the local environment where a bacterium population lives may occur due to various reasons e.g. a significant local rise of temperature may kill a group of bacteria that are currently in a region with a high concentration of nutrient gradients. Events can take place in such a fashion that all the bacteria in a region are killed or a group is dispersed into a new location.

The Bacterial Foraging Optimization summarized below:

Initialize input parameter:

S_p : Total number of bacteria in the population

N_cN_r : The number of chemotaxtis step

N_edP_i : The number of reproduction step

C(i): The length of swim

: The number of elimination-dispersal events.

: Elimination-dispersal probability.

: The size of the step taken in the random direction specified by the tumble.

Create random initial swarm bacteria θⁱ(j,k,l) and initialize their fitness J_health
For i =1, 2 … N_ed

For j =1, 2 … N_re
For k =1, 2 …N_c
For l =1, 2 … S_p
Perform chemotaxis step:

Compute fitness function, J (i, j, k, l)
Perform tumble (indicate direction of step)
Swim with suitable C(i) until neither improve fitness nor arrive the length of swim N_s.

End for

The S bacteria with the highest J_health values die and the remaining S bacteria with the best values split.

End for

Perform the reproduction step by eliminating _disperal step for all bacteria θⁱ(j,k,l) with probability 0 < P _ed < 1.

End for

5. Adaptive BFO Based on PSO

ABF-PSO algorithm combines both BFO and PSO. The aim is to make PSO able to exchange social information and BFO able in finding new solution by tumble, swim, reproduction, elimination and dispersal.

In the original bacterial foraging algorithm the Chemotaxis step length Cis a constant. By this way, it is hard to keep a right balance between global search and local search. Therefore, the Chemotaxis step length is one of the most important adjustable parameters in BFO. The proper selection of the Chemotaxis step length is critical to the success of BFO algorithm.

In ABFO_PSO algorithm, PSO is used to update the length of Chemotaxis step. The velocity of bacteria still has random direction because of tumble behaviour of BFO, but the magnitude of the velocity is calculated by using social behaviour of PSO. C (i) is calculated as the magnitude of the velocity of particle in PSO:

(4)

where θ_best is update every swim step θ_bestis update every reproduction step.

Because this combination using social information of PSO and the swarming step represents social behavior in BFO, the swarming step is avoided in ABFO_PSO. The swarming step take a long time in BFO so using social information of PSO instead of swarming step make BFO faster and more robust.

6. Neural Net Work Training

Using a previous optimization algorithm to optimize network weights lies in the following two points:

Map between particles / bacteria and network connection weights. The dimension of each particle/ bacteria corresponds to a neural network connection weights. That the number of neural network connection weights (including bias value), should be equal to number of particles/bacteria.
Use mean square error of neural networks as the fitness function of optimization algorithms:

(5)

To avoid over fitting problem in classification and to generalize neural network, the valid approach is used. In this approach, data set is divided into 3 subsets:

Training set (TS):Used for updating the weights
Training test set (TTS), also called validation set: The error on the TTS is monitored during the training; this error will normally decrease during the initial phase of training. However, when the network begins to overfit the data, the error on the TTS will typically begin to rise Training is stopped when the error on the TTS increases for a presspecified number of iterations and the weights and biases at the minimum of the TTS error are returned.
Testing set: Not used during training but to compare different algorithms once training has completed.

For each set the error is calculated using equation (5).

7. Experiments

In this section we discuss the experiment design, execution of the comparison between ABFO_PSO, BFO and PSO for learning neural network.

7.1 Data set

We chose Breast Cancer dataset instances from the UCI Machine Learning Repository (Arthur Asuncion,& David Newman, 2007):

Breast Cancer datasets include 699samples, each sample by nine features. Features are computed from a digitized image of a Fine Needle Aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

The samples are classified to two classes: Benign and Malignant. Those classes are not uniformly distributed. A number of Benign samples are 458 and a number of malignant samples are 241.

7.2 Neural network structure

The researchers use MLFFNN (9-9-2) with one hidden layer, the number of neuron in hidden layer is equal to the number of data feature (inputs). Both hidden layer and output layer have sigmoid function,

(6)

7.3 Tests

Learning algorithm in network training performance test usually has the following test:

E (error rate): the ratio between error classification samples to total number of samples.

MSE: The Mean Squared Error between target and actual output.

Each previous test is applied to all subsets of data(validationset, testingset, training set).

Because of imbalanced data set, accuracytest, recall teat and precision are used:

(7)

(8)

(9)

where,

TP = donors in group and model classified into group

FP = donors not in group and model classified into group

FN = donors in group and model not classified into group

TN = donors not in group and model not classified into group

Time test: The researchers make comparison between times that optimization method use in learning processes.The specifications of computer used in experiment are Intel core i5-460M processor and 4GB DDR3.

8. Result

In this section we discuss the results of the comparison between ABFO_PSO, BFO and PSO for learning neural network.

Firstly, the result of mean square error and time test is shown in Table1 below:

However the Table 1 shows that PSO makes learning process inless thanone minute, it has the greatest MSE for each subset. The BFO reduce the MSE error, but it takes more than 10 minutes. Besides, ABFO_PSO has the best MSE for valid set and test set, it take less than 7 minute. The figures of curves of MSE vs iteration show the more advantages of ABFO_PSO.

Table 1. The result of MSE and time test

As shown in Figure (3), when PSO is used, the curves of MSE train are stable, but the step length of reducing error changes randomly and the curve of MSE on validation test start to be affected after four iterations.

Figure 3. The result of MSE and time test with PSO

Figure (4) shows that BFO can reach to better MSE value ,but the curve of MSE on training test starts to oscillate at iteration number 8.

Figure 4. The result of MSE and time test with BFO

Figure (5) shows the ABFO_PSO has the best curves of MSE, which faster and more stable and reach the best minimum value. The convergence behavior of ABFO_PSO is better than BFO because it does not oscillate. The step length of reducing error begins large and decreases gradually.

Figure 5. The result of MSE and time test with ABFO_PSO

Secondly, the result of ERROR rate is shown in Table2:

The Table 2 shows that the PSO has the largest number of error classification samples for each subset. Although BFO has the least error rate of traning set, ABFO_PSO has the least value in validation and testing set.

Table 2. The error rate for each method used

As Table1 and Table 2 show that ABFO_PSO has the best values of MSE and E fore validation and testing set, it makes neural network more generalizing.

Thirdly, the result of recall and precision and accuracy test of testing set is shown in Table 3, Table 4 and Table 5.

Table 3. The result of recall and precision with PSO

Table 4. The result of recall and precision with BFO

Table 5. The result of recall and precision with ABFO_PSO

PSO algorthim hase the least value of accuracy, and large difference between recall value and precision value and difference between the results of tests for two classes.

BFO algorthim can improve the accuracy, but the difference between recall value and precision value and difference between the results of tests for two classes are still clear.

ABFO_PSO has the best accuracy value and the least value of difference between recall value and precision value and difference between the results of tests for two classes. It is the best for unbalanced data set.

Conclusions

In this paper we provide unprecedented method, ABFO_PSO, in learning neural network. The results ofcomparison that was made between ABFO, PSO BFO and PSO in learning neural networkshow that the new ABFO_PSO ismore accurate, stable and faster. It makes neuralnetwork more generalizing and reduces the value of MSEand error rate. It can deal with unbalances data set very well. ABFO _PSO can avoid the oscillatory behavior around the optimum. The new method has the advantages of both PSO and PFO. BFO gives it the accuracy and PSO makes it faster.

For further studies,we can useABFO_PSO with other applications to show the strength of it. Trying to hybrid between ABFO_PSO and genetic algorithm is proposed to have more robust optimization theory.

References

[1]. Ben Niu, Hong Wang, Lijing Tan, Li Li. (2011). Improved BFO with Adaptive Chemotaxis Step for Global Optimization. Hainan:Seventh International Conference on Computational Intelligence and Security, 76-80.

[2]. Xin Xu, Yan-heng Liu, Ai-min Wang, Gang Wang and Hui-ling Chen. (2012). A New Adaptive Bacterial Foraging Optimizer based on Field. Chongqing:EighthInternational Conference on Natural Computation (ICNC 2012), 986- 990.

[3]. K. M. Bakwad, S.S. Pattnaik, B. S. Sohi, S. Devi, B.K. Panigrahi, Sanjoy Das and M. R. Lohokare. (2009). Hybrid Bacterial Foraging with Parameter free PSO. Coimbatore: World Congress on Nature & Biologically Inspired Computing, 1077 - 1081.

[4]. D. H. Kim, A. Abraham, and J.H. Cho. (2007). A Hybrid Genetic Algorithmand Bacterial Foraging Approach for Global Optimization. Information Sciences, vo.177, 3918–3937.

[5]. EspinalAndr´ es, Sotelo-Figueroa Marco, Soria- Alcaraz Jorge A, Ornelas Manuel, PugaHector and CarpioMart´ın. (2011). Comparison of PSO and DE for training neural networks. Puebla: 10thMexican International Conference on Artificial Intelligence, 83-87.

[6]. Li Ting and Wu Min (2008). An Enhanced Parallel Backpropagation Learning Algorithm for Multilayer Perceptrons. Chongqing: 7thWorld Congress on Intelligent Control and Automation,5287-5291. vo.4, 1942-1948.

[7]. S ingiresu S. Rao. (2009). Engineering Optimization Theory and Practice, Fourth Edition.USA: John Wiley & Sons, Inc.

[8]. Abd-Elazim, E. S. (2010). Optimal PID Tuning for Load,410-415.

[9]. Arthur Asuncion and David Newman (2007). The UCI Machine Learning Repository Center for Machine Learning and Intelligent Systems at the University of California, Irvine, available at: http://archive.ics.uci.edu/ml/.

[10]. MAN Chun-tao and WANG Kun, ZHANG Li-yong. (2009). A New Training Algorithm for RBF Neural Network Based on PSO and Simulation Study. Los Angeles, CA: WRI World Congress on Computer Science and Information Engineering, vo.4,646-650.

[11]. Andrich B. van Wyk and Andries P. Engelbrecht. (2010). Overfitting by PSO Trained Feedforward Neural Networks. Barcelona: IEEE Congress on Evolutionary Computation (CEC), 1-8.

[12]. Jun Liu and Xiaohong Qiu. (2009). A Novel Hybrid PSO-BP Algorithm for Neural Network Training. Sanya. Hainan: International Joint Conference on Computational Sciences and Optimization, 300-303.

[13]. MarcioCarvalho and Teresa B. Ludermir. (2006). An Analysis of PSO Hybrid Algorithms For Feed-Forward Neural Networks Training. Ribeirao Preto, Brazil:Ninth Brazilian Symposium on Neural Networks,6-11.

[14]. Yong GuiHe and Li Bo. (2009). Price Forecasting Based on PSO Train BP Neural Network. Wuhan: Power and Energy Engineering Conference,1-4.

[15]. Hanning Chen, Yunlong Zhu and KunyuanHu. (2008). Self-Adaptation in Bacterial Foraging Optimization Algorithm, 3rd International Conference on Intelligent System and Knowledge Engineering3,1026-1031.

[16]. Kennedy, J. a. (1995). Particle swarm optimization. Perth, WA: Proceedings, IEEE International Conference on Neural Networks, Vol.4, 1942-1948.