Application of CS-PSO algorithm in Bayesian network structure learning

2020-04-21 01:22LIJunwuLIGuoningZHANGDing

LI Jun-wu, LI Guo-ning, ZHANG Ding

(School of Automation and Electrical Engineering, Lanzhou Jiaotong Uninversity, Lanzhou 730070, China)

Abstract: In view of the shortcomings of traditional Bayesian network (BN) structure learning algorithm, such as low efficiency, premature algorithm and poor learning effect, the intelligent algorithm of cuckoo search (CS) and particle swarm optimization (PSO) is selected. Combined with the characteristics of BN structure, a BN structure learning algorithm of CS-PSO is proposed. Firstly, the CS algorithm is improved from the following three aspects: the maximum spanning tree is used to guide the initialization direction of the CS algorithm, the fitness of the solution is used to adjust the optimization and abandoning process of the solution, and PSO algorithm is used to update the position of the CS algorithm. Secondly, according to the structure characteristics of BN, the CS-PSO algorithm is applied to the structure learning of BN. Finally, chest clinic, credit and car diagnosis classic network are utilized as the simulation model, and the modeling and simulation comparison of greedy algorithm, K2 algorithm, CS algorithm and CS-PSO algorithm are carried out. The results show that the CS-PSO algorithm has fast convergence speed, high convergence accuracy and good stability in the structure learning of BN, and it can get the accurate BN structure model faster and better.

Key words: Bayesian network; structure learning; cuckoo search and particle swarm optimization(CS-PSO)

0 Introduction

Bayesian network (BN) applies probability theory to complex systems and carries out uncertainty reasoning and data analysis. The combination of graph theory and probability theory is one of the most classical and effective methods of uncertainty knowledge representation and modeling[1-2]. Since BN has powerful diagnostic reasoning, knowledge expression ability and great influence, it has become a hot topic in current research and has been applied to many fields, such as fault diagnosis, reliability analysis, decision support, etc.[3-5]. The core of BN is structural learning, and the foundation of parameter learning is structural learning model.

At present, there are two ways to learn BN through sample data: independence constraint method and search scoring method. The latter is a common method, which can draw a more accurate network structure through data learning. With the continuous improvement of the data, the learned structure is continuously accurate, but when the structure is more complicated, the search complexity of the algorithm will be large, and the learning efficiency of the algorithm will be also low. Supposing that the sample data containnvariables, through structure learning, the number of the existing networks is expressed as

(1)

The complexity of network model structure increases exponentially with the number of nodesn, and BN structure learning is also considered as a NP-hard problem[6]. Therefore, selecting effective algorithm becomes the key of research.

In recent years, the structure learning of BN mainly focuses on the research of data modeling based on heuristic algorithm. In Refs.[7-9], the structure of BN network is studied by using colony algorithm, unconstrained optimization combined with genetic algorithm and ant colony algorithm. Heuristic algorithms have achieved great results in BN structure learning. However, there are still many shortcomings in the complexity and learning effects of the algorithm. The effect is general in the process of optimization and modeling.

The cuckoo search (CS) algorithm is a new heuristic algorithm. The algorithm has few parameters, strong versatility, and excellent global search ability, it is simple and easy to implement, and the overall performance is superior to that of other intelligent algorithms. It has been widely studied and has solved many real problems in some fields. However, the algorithm also has the disadvantages of random blindness and low precision in the search process. Therefore, CS is chosen as the structural modeling algorithm of BN, and the CS algorithm is optimized from three aspects to improve the performance of algorithm modeling.

Aiming at the randomness of cuckoo nest establishment in the initialization process of CS algorithm parameters, we introduce the mutual information among nodes to establish the maximum support tree to guide the update direction of the algorithm. According to the optimization of the solution and the randomness of the discarding process, the fitness of the solution is introduced. Because of the update mechanism of the solution during Levy flight has randomness and slow convergence, the particle swarm algorithm (PSO) is introduced to update the position of the CS algorithm. According to the structural characteristics of BN, a BN structure learning algorithm of CS-PSO is proposed, which provides a new idea for BN model construction and has certain practical value.

1 Principle of CS-PSO algorithm

1.1 Cuckoo search

The CS algorithm is a new heuristic intelligent algorithm inspired by the cuckoo nest laying behavior and the Levy flight[10]. Yang et al. put forward three hypotheses by simulating the breeding habits of cuckoo[11]: (1) the cuckoo selectes one bird’s nest at random and produces only one egg at a time; (2) the best quality in the nests of the cuckoo will be inherited to the next generation; (3) the total nest amount is fixed, and the host will find the eggs of the cuckoo in the nest in a certain probability.

CS uses Levy flight mechanism to randomly search the nest, and the optimal position and path update equation are

(2)

wheretrepresents the location of nesti; ⊕ represents point multiplication;Nrepresents the size of the bird’s nest;αrepresents the step factor;L(β) represents the Levy distribution obeying the parameterβas

(3)

Levy(β)~μ=t-β, 1<β≤3,

(4)

1.2 Particle swarm optimization

PSO algorithm[12-13]is inspired by the foraging behavior of birds. It is derived from a random search optimization algorithm proposed by the study of bird predation behavior. Each particle in the algorithm moves towards the global optimum and the best position currently searched. If the current position of the particle search is better than that of the previous location, the position is replaced as the current optimum of the particle. In the iteration process, its speed and location update are expressed as

(5)

(6)

1.3 Cuckoo search-particle swarm optimization

1) When the bird’s nest is initialized, the nest is randomly generated and the relationship between the nest is uncertain. Therefore, in order to improve the search efficiency of the algorithm and make the search more targeted, the maximum spanning tree is established through mutual information among nodes to guide the updating direction of the algorithm. For example, ifXiandXjare random variables, the mutual information between the variables is

(7)

wherep(Xi,Xj) represents joint probability of variableXiandXj.I(Xi,Xj)>0 indicates that there is an undirected sideXi-Xjbetween variablesXiandXj. The larger theI(Xi,Xj), the higher the dependency between the variables. On the contrary, it shows that variablesXiandXjare independent of each other. The construction process of the maximum support tree is as follows: a tree network structure is obtained by traversing all the node variables, preserving the undirected sides with maximum mutual information among the nodes.

2) There is randomness in the solution and rejection strategy of CS algorithm. The abandoning probabilitypais a fixed value. The algorithm generates a random numberrbetween [0,1]. Ifr>pa, the solution is found and a new solution is got through a random walk of preference. Otherwise, it is retained. This randomness leads to the abandoning of good fitness solutions, while the solution of the difference fitness is preserved, which affects the convergence speed and the quality of the algorithm. Therefore, the fitness of the solution is introduced as a measure of whether or not to discard the solution. The improved abandoning probability of the solution is

(8)

whereFitirepresents the fitness of nesti;gBestFrepresents the fitness of the global optimal solution;gWorstFrepresents the fitness of the global worst solution.

3) In the CS algorithm, the high randomness of Levy can make the search process jump from one region to another, expand the search range and increase the diversity of the population. But high randomness of the search will also brings high blindness, which leads to slow convergence and low search efficiency. Therefore, the location updating idea of PSO algorithm is introduced in the location updating of CS algorithm.

The CS-PSO algorithm has made three improvements to the CS algorithm. It avoids the shortcomings of the CS algorithm and combines the advantages of both. In the process of optimization, the CS-PSO algorithm can make the bird’s nest update like the particle in search of the current optimal solution and global optimal solution, keep the optimal solution search randomness and reduce the blindness of the search. In the process of updating the position of the bird’s nest with the PSO position updating idea, the Levy flight and random elimination mechanism of its self-body also strengthens the vitality of bird’s nest renewal, improves the global search ability of the algorithm, and accelerates the convergence speed and search efficiency. The flow chart of the CS-PSO algorithm is shown in Fig.1.

2 BN structure learning based on CS-PSO algorithm

2.1 Expression and initialization of bird’s nest

In this paper, the BN network structure is represented by the relation matrix between nodes. Each bird’s nest represents a node state, and the nest’s location is 0 or 1 on the matrix dimension. It is represented by matrixE=C(n,n),nis the nest number, and the matrix elementCijis defined as

(9)

The BN structure model is converted to the node relational matrix, as shown in Fig.2.

Fig.2 BN structure matrix conversion diagram

Initialization of the cuckoo population process is as follows: theNbird’s nests are initialized randomly according to the sample data, and all the bird’s nests are traversed. The mutual information between the birds’ nests is calculated by Eq.(7), the undirected sides of the variables with maximum mutual information are retained, and the maximum support tree network structure is established to obtain the mutual information matrix, at the same time, ensuring that the maximum support tree after establishment is a directed acyclic graph (DAG).

2.2 Bird’s nest location updating of PSO algorithm

In the PSO algorithm, particles update their positions according to their own and near particle velocity, and get their current optimal location and group global optimal position. The velocity of the particle is consistent with the position of the bird’s nest, and it is represented by matrix. Its matrix elementVijis expressed as

(10)

The particle velocity is normalized by using Eq.(11), and the velocity value is transformed into the particle velocity probability as

(11)

The velocity matrix[13]is updated by the binary PSO optimization theory as

(12)

After the bird’s nest is initialized, the PSO algorithm is introduced to update the idea, which is equivalent to the initialization matrix, such as increasing side, reducing side and reverse side. When the particle is flying fromCtoC′ at the speed ofv, the update process is shown in Fig.3.

Fig.3 PSO algorithm particle position update diagram

2.3 CS algorithm Levy flight of bird’s nest location update

Levy flight is a search method combining short-range exploration and occasional long-distance conversion, which can greatly improve the range of search, jump out of the local optimal solution, and the global search effect is excellent. In this paper, the Levy flight of CS algorithm is used to study the structure of BN. The essence of Levy flight updating the location of bird’s nest is to increase side, reduce side and reverse side operation between nodes. The continuous Levy flight space binary transformation rule is given by[14]

(13)

(14)

2.4 Adaptive function representation of bird’s nest in BN structure learning

Corresponding to the BN structure modeling process, the choice of the fitness function of cuckoo bird’s nest is the selection of BN scoring function. The score function of BN represents the matching similarity between the training data set and the BN model structure. The commonly used scoring functions include BDe score, BIC score and MDL score. In this paper, BIC score is used as the fitness function of cuckoo nest as

(15)

whereδrepresents training data set;ηrepresents model structure;θ*represents network structure parameters; the latter (d/2)logmof the BIC score function is the complexity correction function of the model structure, which can greatly reduce the over fitting of the network, and make the established network model more streamlined and efficient.

2.5 Illegal model in BN structure learning

BN is a database availability group (DAG). In the BN structure learning, especially in the position update of the PSO algorithm and the position update of the CS algorithm Levy flight, the illegal network structure will appear in the learning operation of increasing side, reducing side and reverse side of the BN. This will lead to inaccurate learning of network structure. The following three categories of illegal structures may appear.

1) Node self-loop structure: there is a directed arcA→Apointing to itself, as shown in Fig.4.

Fig.4 Node self-circulation BN structure diagram

2) Node bidirectional cycle structure: there is aA→Bdirected arc between nodes, and there is a directed arcB→A, which forms a bidirectional circular arc, as shown in Fig.5.

Fig.5 Node bidirectional circular BN structure diagram

Fig.6 Node self-loop cycle BN structure diagram

3) Self loop structure: there are more than three directed arcs in the network structure, which constitute a loop circuit. The three sidesA→B,B→CandC→Aconstitute the loop of deterioration, as shown in Fig.6.

2.6 BN’s illegal model ring removal operation

There are three ways to deal with illegal structures above BN: randomly remove an illegal side and update the network structure by equal probability, regenerate the BN structure, check the legality, and form a legitimate DAG. The specific operation steps are as follows:

1) Obtain the node relation matrix corresponding to the BN structure;

2) To check the illegal elements in the node relation matrix, if the main diagonal elements of the matrix are 1, the symmetric elements of the matrix are all 1, the closed loop elements of the matrix are all 1, and the elements are the illegal elements of the matrix;

3) Elect an illegal element in the matrix to zero operation;

4) Loop update matrix until there is no illegal element in the node relation matrix.

3 Experiment and simulation analysis

In this paper, Win7 operating system and Matlab (R2014a) simulation software are used as experimental environment. Through the Full-BNT toolbox of BN, the algorithm is modeled, simulated and verified. In the process of experiments, three classic networks, car diagnosis, chest clinic and credit, which are provided by GeNIe and Netica software, are selected as sample networks. The data of different sizes are selected from the three networks as the sample data of modeling and simulation to evaluate the degree of the Bayesian network structure learning algorithm. The data structures of the three networks are shown in Table 1.

Table 1 Three classical BN data structures

In this paper, the CS-PSO algorithm is used to learn the structure of the BN model. The parameters are set as follows: step length adjustment parameterα=0.01; Levy distribution parameterβ=1.5; probability of discoveryPa=0.4; maximum exploratory step numberK=130; Cuckoo nestsN=50; particle learning parametersc1=c2=2; weight coefficientω=0.9.

1) Selecting 500, 1 000, 2 000 samples of chest clinic network, using CS-PSO algorithm to study BN structure of chest clinic network, in order to increase the accuracy of the experiment, each group of sample data are studied 10 times, and the optimal network structure is taken as the final learning structure model, as shown in Fig.7.

Fig.7 Chest Clinic network structure diagram studied under various conditions

As seen from Fig.7, when the number of samples is 500, there are only one reverse side compared with the standard structure, and the error is 1. The learning network structure is close to the standard structure. When the number of samples is 1 000, the error rate is 0, and the standard network structure can be learned by this algorithm.

2) Greedy algorithm, K2 algorithm, CS algorithm, and CS-PSO algorithm are used to study the structure of BN for the sample data of three classical networks of car diagnosis, chest clinic, credit, respectively. The learning results are shown in Tables 2, 3 and 4. Among them, the number of multi-sides is expressed byNmulti, the number of minor-sides is expressed byNmin, the number of anti-sides is expressed byNanti, and the average number of wrong sides is expressed byNave.

It can be seen from Tables 2, 3 and 4 that in the chest clinic network with low complexity, the proposed algorithm is learned by BN structure when numbers of the samples are 500, 1 000, 2 000, 5 000 and 10 000, respectively, and the average error value of the learning network structure is lower than that of the other algorithms.

Table 2 Chest clinic network learning effect comparison

Table 3 Credit network learning effect comparison

Table 4 Car Diagnosis t network learning effect comparison

In the medium complexity credit network, the same learning effect is better than that of the other algorithms. When the number of samples reaches 5 000, the network structure has zero error. In the car diagnosis network with high complexity, the error of the network structure obtained by the proposed CS-PSO algorithm is very low, especially when the number of samples reaches 5 000, the learning network structure and the standard network structure are very close.

The reason is that the K2 algorithm needs to predict the number of nodes and the maximum number of nodes in advance, therefore it is still quite different to learn BN structure in ideal circumstances. Greedy algorithm is easy to fall into local optimum when the network structure is complex, and the search accuracy is low. In the CS algorithm, the randomness of Levy flight will lead to slow search speed and low precision, which limits the ability of the algorithm to optimize.

3) The BIC score of the network structure is an index for evaluating the quality of BN, and it not only can reflect the adaptability of the samples to the network structures, but also can evaluate the advantages and disadvantages of the learning network. By sampling 1 000 samples from the above three classical networks, the BN structures of different algorithms are learned, and the BIC scores are obtained, as shown in Table 5.

Table 5 BIC scores comparison of BN structure learning

It can be concluded from Table 5 that the BN model established by the proposed algorithm has the highest BIC score, and can find the BN structure with the best BIC score. The BIC score and the BIC score of the standard network structure are the closest. It shows that the algorithm proposed is optimal compared to other algorithms.

4) Select 1 000 sets of experimental data from the car diagnosis network and set the maximum number of iterations to be 50. Model data are modeled using four algorithms. The number of iterations and the corresponding maximum BIC score of the population are recorded. The experimental results are shown in Fig.8.

Fig.8 Algorithm convergence comparison diagram

It can be seen from Fig.8 that as the number of iterations increases, the BIC score increases continuously, and the learning effect is getting better and better. The CS-PSO algorithm tends to converge at about 20 times, while the other three algorithms tend to converge at around 30 times. The CS-PSO algorithm has a BIC score of -6 432.1, which is the highest score. It can be concluded that the convergence of the proposed algorithm is significantly better than that of other three algorithms and can learn a better network structure, and the final learning effect is also the best.

5) In this paper, 2 000 data from car diagnosis network are selected as sample data. The number of iterations of the algorithm, the running time of the greedy algorithm, the K2 algorithm, the CS algorithm and the CS-PSO algorithm modeling program are compared, as shown in Fig.9.

Fig.9 Algorithm running time comparison diagram

As shown in Fig.9, the running times of the four modeling algorithms increase linearly with the number of iterations. CS-PSO modeling algorithm in the number of iterations 10, 20, 30, 40 and 50 times, the algorithm’s running time is less than other algorithms. Especially when the number of iterations reaches 40 times, the advantage is very obvious.

The main reasons are as follows: the CS algorithm uses the Levy flight, the global search ability is strong, and it has better convergence speed than the other algorithms. Moreover, it combines the advantages of the PSO algorithm and improves the global and local search capability of the CS algorithm.

4 Conclusion

In this paper, the CS algorithm is used to optimize from three aspects. Combined with the structural characteristics of BN, a BN structure learning algorithm for CS-PSO is proposed. The simulation experiments and evaluations of the algorithm are carried out by selecting three classic networks of different scales. The network structure and the standard network structure are compared, and the number of multi-sides, minor-sides and anti-sides are compared. The BIC scores of BN are compared. The four indexes of convergence of different algorithms and program running time are compared. At the same time, the greedy algorithm, K2 algorithm, CS algorithm and CS-PSO algorithm are compared and analyzed. Experiments show that the proposed algorithm has fast convergence speed, high convergence precision and good stability in BN structure learning. For different complexity network structures, the standard network structure can be efficiently learned through less sample data. The learning effect is better than that of other three algorithms.

Through the BN structure learning of CS-PSO algorithm, the accurate BN network structure model can be obtained faster and better. However, the CS-PSO algorithm itself is greatly affected by the parameter settings. The settings of the model parameters in this paper are given according to the experimental experience. The parameter selection is not necessarily optimal, so how to optimize the parameter settings is the focus of the next step.