TAN Xiangyuan ,GAO Xiaoguang,* ,HE Chuchao,2 ,and WANG Zidong
1.School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710129,China;2.School of Electronics and Information Engineering,Xi’an Technological University,Xi’an 710021,China
Abstract:How to improve the efficiency of exact learning of the Bayesian network structure is a challenging issue.In this paper,four different causal constraints algorithms are added into score calculations to prune possible parent sets,improving state-ofthe-art learning algorithms’ efficiency.Experimental results indicate that exact learning algorithms can significantly improve the efficiency with only a slight loss of accuracy.Under causal constraints,these exact learning algorithms can prune about 70%possible parent sets and reduce about 60% running time while only losing no more than 2% accuracy on average.Additionally,with sufficient samples,exact learning algorithms with causal constraints can also obtain the optimal network.In general,adding max-min parents and children constraints has better results in terms of efficiency and accuracy among these four causal constraints algorithms.
Keywords:Bayesian network,structure learning,exact learning algorithm,causal constraint.
The Bayesian network (BN) [1] is a probabilistic graphical model that can efficiently express and infer uncertain knowledge,helping people apply probabilistic statistics to complex fields.A BN is a directed acyclic graph (DAG)in which each node represents a random variable in the domain and the directed edges between nodes represent the direct probabilistic dependencies between variables.These relationships are quantified by conditional probability distributions.In brief,a BN represents a joint probability distribution of a set of random variables.
Learning a BN is usually conducted in two parts:structure learning and parameter learning.Structure learning first learns the DAG structure,and then parameter learning estimates the parameters of the conditional distributions given the DAG.Parameter learning in the second phase is considered a well-studied problem,so structure learning is more challenging [2].
The structure learning of BNs is to automatically determine the topological relationship between BN nodes from the observed data.This is a very important challenge and has been extensively studied in the past few decades. However,learning the optimal BN structure from the data has been proved to be an NP-hard problem[3,4],and the number of possible network structures is known to be super-exponential on the number of nodes.According to the results,BN structure learning algorithms can be divided into approximate learning algorithms and exact learning algorithms.The approximate learning algorithms are fast but can only achieve local optimal or close to the optimal network structure;the exact learning algorithms can guarantee in theory that the learned network is optimal,but the efficiency of learning is lower than that of the approximate learning algorithms.In recent years,approximate learning algorithms have been developed towards large BNs [5],and exact learning algorithms have been developed to improve the efficiency [6].This paper mainly studies the exact learning algorithms of the BN,aiming to improve the efficiency of exact learning algorithms.
In the past ten years,more researchers have focused on studying the exact learning of the BN structure.These algorithms can theoretically guarantee that the obtained BN is optimal.Koivisto et al7] proposed an exact learning algorithm for BNs based on dynamic programming (DP).Then,Singh et al8] and Silander et al9] had different small improvements in efficiency.Malone et al10] proposed memory-efficient dynamic programming (MEDP),which utilizes the layered structure within the dynamic programming lattice to reduce the memory requirements.The above DP algorithms require a complete search on a state-space called order graph,which grows exponentially with variables,so these DP algorithms are always exponential.Later,Yuan and Malone [11] regarded the structure learning problem of BN as the shortest path problem in order graph and proposed the A* algorithm[12],the anytime window A* (AWA*) algorithm [13]and the breadth-first branch and bound search algorithm(BFBnB) [14].These heuristic algorithms utilize admissible and consistent heuristic functions [15] adapted to BN structure learning,ensuring these heuristic algorithms finally find the optimal score.A heuristic functionhis said to be admissible if it never overestimates the cost of reaching the goal.A heuristic functionhis said to be consistent,or monotone,if its estimate is always less than or equal to the estimated distance from any neighboring state to the goal,plus the cost of reaching that neighbor.Compared with the above DP algorithms,these algorithms only need to perform a partial search in the order graph,thereby improving the overall efficiency of these algorithms.Besides,some algorithms are based on mathematical programming ideas,such as constraint planning for the constraint programming for Bayes (CPBayes) [16]algorithm,and integer linear programming for the globally optimal BN learning using integer linear programming (GOBNILP) [17−20] algorithm.Malone et al21]performed a comparison experiment of three current stateof-the-art algorithms (A*,CPBayes and GOBNILP) of exact learning.In their experiments,no single algorithm dominates the others in speed.Yang et al22] tried to layer the order graph and then used MEDP to learn the structure for each layer.Although it is theoretically feasible,the experimental results cannot be optimal.Wang et al23] considered incorporating ancestral constraints into exact learning algorithms based on the order graph.Tan et al24] proposed a bidirectional heuristic algorithm to search the order graph.
The above algorithms are more based on search strategies to improve the efficiency of the exact learning of the BN structure.In fact,for these exact learning algorithms,all scores of possible parent sets need to be calculated in advance.In theory,a total ofn2n−1scores need to be calculated without any constraints,and such a huge amount of calculation is amazing.In the early years,Malone [11] used the pruning rules to prune the calculation of minimum description length (MDL) scores and this number is appropriately reduced.Campos et al25]proposed pruning rules for the Bayesian information criterion (BIC) scores but not analyzed the accuracy of these rules in experiments.Correia et al26] proposed pruning rules for Bayesian Dirichlet equivalent uniform (BDeu)scores.Using causal constraints is another way to improve efficiency.However,the addition of causal constraints is more used in approximate learning algorithms,such as the Grow-Shrink (GS) [27] algorithm and the max-min hill-climbing (MMHC) [28] algorithm.There is almost no relevant literature on the influence of causal constraints on exact learning algorithms.This paper further reduces the score calculations in exact structure learning by using the causal constraints calculated by incremental association Markov blanket (IAMB),IAMB with the Peter-Clark (PC) algorithm in the interleaved pruning phase (interIAMBnPC),HITON_PC and maxmin parents and children (MMPC) algorithms [29−31],and studies the influence of these reductions on A*,AWA*,BFBnB,and GOBNILP.We believe that this paper can provide a reference for experimental results for the constraints of exact learning algorithms in the future.
The remainder of this paper is organized as follows.Section 2 introduces necessary information about BN and the exact learning of BN structure.Section 3 describes how to introduce causal constraints into the score calculations to prune scores.Section 4 presents the experimental results and related discussions.Section 5 concludes this work.
A BN is a probabilistic graphical model,denoted byBN=(G,P),whereGis a DAG andPis a joint probability distribution over a set of random variablesV={X1,X2,···,Xn}.If there is a directed edgeXi→Xj,which represents the direct influence between the two variables,thenXiis identified as a parent ofXjandXjis a child ofXi.We denote the parent set ofXibyPa(Xi).The joint probability distributionPover all variables can be factorized as the product of the conditional probability distributions as follows:
A DAGGexpresses a set of conditional independence(CI) relations over a set of variablesV.These CI relations can be determined by d-separation.
Definition 1Conditionally independent [32]
Two nodesXiandXjare conditionally independent given the set of variablesZ,if and only ifP(Xi,Xj|Z)=P(Xi|Z)P(Xj|Z),denoted asIP(Xi,Xj|Z).
Definition 2d-separation [1]
A path between nodesXiandXjin a DAGGis said to be d-separated by a set of variablesZif and only if
(i) there is a non-collider on the path inZ,or
(ii) there is a collider not inZand none of its descendant is inZ.
XiandXjare said to be d-separated givenZ,denoted byDsepG(Xi,Xj|Z).
Theorem 1[1] For anyXi∈V,Xj∈VandZ⊆V{Xi,Xj},DsepG(Xi,Xj|Z)⇒IP(Xi,Xj|Z).
Theorem 1 shows the connection between d-separation and CI.In other words,ifXiandXjare said to be d-separated byZ,thenXiandXjare conditionally independent givenZ.
Definition 3Markov blanket [31]
The Markov blanket ofX,MB(X),is a minimal set conditioned on which all other nodes are independent ofX,i.e.,∀Y⊂V(MB(X)∪{X})⇒IP(X,Y|MB(X)).
For every variableXofV,MB(X) is the set of parents,children,and spouses (parents of common children) ofXin a BN.This property will play a crucial role in the constraints on score calculations later.
The problem of learning an optimal BN is to find an optimal DAG that best fits the given dataD={D1,D2,···,DN}(Nis the number of samples).Various scoring functions have been proposed to measure the quality of fit of a networkGto dataD,such as MDL [33],BIC [34],and BDeu [35].All of these scoring functions are decomposable.Many studies on exact learning algorithms used the MDL score [10−15].Thus,in this study,we also use the MDL score,and hence the goal of learning an optimal structure of the BN is to find
And the MDL score of the BN structure can be decomposed into
where
In the above formula,the state space ofXiis denoted by ΩXiand the state space ofPa(Xi) is denoted byΩPa(Xi).H(Xi|Pa(Xi)) denotes the log-likelihood ofXia nd its parent set.Nk,p/Npis the maximum likelihood estimate of the conditional probabilityP(Xi=k|Pa(Xi)=p).NpandNk,prepresent
respectively the number of times thatPa(Xi)=pandXi=k∧Pa(Xi)=pappear in data.K(Xi|Pa(Xi)) is the complexity penalization forXiand its parent set.
According to the literature review in the introduction,more exact learning algorithms are based on the order graph.Here we will briefly introduce the theory of exact learning algorithms based on the order graph.
Koivisto [7] proposed a dynamic programming algorithm using the decomposability of the scoring function.The main idea of dynamic programming is to decompose the problem of learning the optimal BN into the optimal sub-network,until the empty set,and to find the optimal network recursively.There are core recursive formulas:
These formulas show that the optimal network structure score for the variables set can be decomposed into two aspects [8]:(i) the optimal score of sub-network on the variable setV{X};(ii) the optimal score of thatXchooses an optimal parent set fromV{X}.Then,for the variable setV{X},it can be decomposed again according to these formulas until it is decomposed into an empty set.Therefore,dynamic programming works as follows:starting from the empty set,searching for the optimal structures for single variables,and adding variables according to these basic structures,building optimal subnetworks for an increasingly larger variable set until the optimal one is found.This process can be visually shown by an order graph,andFig.1shows the order graph of four variables BN.It can be known that there are 2nstates in the order graph.
Fig.1 An order graph of four nodes BN
In the order graph,a directed path from the initial stateO={}of the top to the goal stateV={X1,X2,···,Xn} of the bottom is the sequence where the variables appear sequentially,corresponding to an ordering.In the ordering,each variable only selects the optimal parent set from the variables set in front of itself,and we can combine these optimal parent sets to form the optimal BN corresponding to the ordering.According to this phenomenon,a series of algorithms based on the shortest path idea have been proposed,such as A*,AWA* and BFBnB.Since the path fromOtoVcorresponds to an ordering,then we can search the shortest path in the order graph,and the optimal BN corresponding to this ordering is the optimal BN. In these algorithms,from the perspective of the shortest path,the path cost in an order graph from the current stateUto the successor stateU→S=U∪{X}(X∈VU) is
A*,AWA* and BFBnB use heuristic strategies to search the shortest path on the order graph,which can only search fewer than 2nstates to build the optimal BN structure.
The possible parent sets of a variableXare subsets ofV{X},which means 2n−1scores of possible parent sets must be calculated for each variable and the whole process computesn2n−1scores of possible parent sets (each possible parent set corresponds to a score).
Yuan et al11] used a data structure called parent graph to store scores.Each variable has its parent graph.The parent graph for a variableXis a Hasse diagram consisting of all subsets of the variables inV{X}.Fig.2shows a simple example parent graph forX1that contains the scores of all subsets of {X2,X3,X4} in four nodes BN.
Fig.2 A parent graph for the variable X1 in four nodes BN
However,not all parent sets can possibly be in the optimal BN;certain parent sets can be discarded according to the following theorems [11].
Theorem2In an optimal BN based on the MDL scoring function,each variable has at mostparents.
According to this theorem,the in-degree (the number of parents) of possible parent sets forXis limited withinm=However,the possible parent sets ofXare still subsets ofV{X}.The number of possible parent sets decreases fromfor each variableXonly if Theorem 2 is used.
Theorem 3LetUandSbe two candidate parent sets forX,U⊂S,andK(X|S) −MDL(X|U) >0.ThenSand all supersets ofScannot possibly be optimal parent sets forX.
Theorem 4LetUandSbe two candidate parent sets forXsuch thatU⊂Sand M DL(X|U) ≤MDL(X|S).Then,Sis not the optimal parent set ofXfor any candidate set.
This theorem simply means that a parent set is not optimal when a subset has a better score.Tian [36] proved this theorem.
In summary,according to these theorems,only the necessary MDL scores are calculated.As a result,the scores that we must calculate are a portion of all 2n−1scores.These score pruning rules are lossless because it is guaranteed not to remove the optimal network from consideration.
The above pruning rules are only valid for MDL scores.Different scoring functions have different pruning rules [26].The researches on the exact learning of BN structure afterward are more from the search strategies.From the perspective of constraint score calculations,this paper studies the possibility of pruning the score calculations to improve the efficiency of structure learning algorithms.
We consider introducing causal constraints to reduce the scores further.Many existing algorithms [29−31] can calculate the set of Markov blankets or parent and children sets of a BN.Related studies [28] and [37] have been introduced into the structure learning of BNs.However,these studies are more about constraining the search space rather than score calculations.
These algorithms for calculating Markov blankets or parent and children sets are generally calculated by statistical tests to identify CI relations (also called CI tests).It is possible for two variablesXiandXjto measure if they are conditionally independent given the set of variablesZby usingG2statistic under the null hypothesis of CI holding [38].Referring byNabcrepresenting the number of times forXi=a,Xj=b,andZ=cin the data,G2is defined by
TheG2statistic is asymptotically distributed as χ2under the null hypothesis.The χ2test returns apvalue,which corresponds to the probability of falsely rejecting the null hypothesis given it is true.Thus,given a significance level α,ifp≤α the null hypothesis is rejected,that is,XiandXjare considered to be conditionally dependent conditioned onZ;otherwise,XiandXjare conditionally independent.
According to the existing literature [29−31],among many algorithms,for calculating Markov blankets,the IAMB algorithm has a high efficiency,the interIAMBn-PC algorithm has a high accuracy.For calculating parent and children sets,HITON_PC and MMPC are state-ofthe-art algorithms.For any variableX,we donateCC(X)for the corresponding causal constraints calculated by these algorithms.Algorithm 1 shows the score calculations pruned by causal constraints and Theorems 2−4.
Algorithm 1Score calculations with causal constraints
To address these limitations of current score calculations,Algorithm 1 is proposed.We first calculate the causal constraintsCC(X) for each variableXbased on the input data.Compared with a large amount of time required by exact learning algorithms,the running time required at this phase is only a small part.Then,for the candidate sets that satisfy the causal constraintsCC(X)and Theorem 2,we calculate the MDL scores for a variableXbased on them.In the process of calculating scores,we use Theorem 3 and Theorem 4 to prune unnecessary scores.Finally,the calculated scores are put into the parent graph for later structure learning.Considering the complexity of common BNs,we estimate that the use of causal constraints can reduce the possible parent sets by more than half based on the original Theorem 2−Theorem 4.
With causal constraints by these different algorithms,we can prune unnecessary score calculations for possible parent sets.Let us useFig.2as an example.Assuming that in four nodes BN,after being constrained by Theorem 2−Theorem 4 (or Theorem 2−Theorem 4 will not produce any constraint effect),the parent graph for variableX1remains unchanged.After that,it is known that the causal constraint corresponding to the variableX1isCC(X1)={X2,X4},
then the parent graph can becomeFig.3.For larger BNs,such a reduction is considerable in score calculations.Imagine that if each node deletes an element in the possible parent sets with the causal constraints and without the constraints of Theorem 2−Theorem 4,the total number of score calculations can be reduced by half,greatly reducing unnecessary score calculations and search.
Fig.3 A parent graph for the variable X1 with causal constraint CC(X1)={X2,X4}in four nodes BN
For the subsets that are not in the range ofCC(X) in the search process,we can still query these scores according to Theorem 4.For example,when querying the score of MDL(X1|{X4}) in the parent graph shown inFig.3,because of {}⊂{X4},the score of MDL(X1|{}) is returned directly.As another example,when querying the score of MDL(X1|{X2,X4}) in the parent graph shown inFig.3,since {},{X2},and {X4} all belong to {X2,X4},the scores of the three are originally queried,and now only the scores of {X2} and {} need to be queried. This undoubtedly increases the efficiency of the query,thereby improving the efficiency of the exact learning of BN structure algorithms.
With causal constraints from these different algorithms,the number of possible parent sets calculated by Algorithm 1 is reduced to different degrees.The results from Algorithm 1 with different causal constraints affect the exact learning of BN structure algorithms.We will experimentally study the influence of these different constraints results on exact learning in Section 4.
To test the performance of causal constraints for score calculations,we input the results of different score calculations into the classic or state-of-the-art exact learning of BN structure algorithms,including A*,AWA*,BFBnB and GOBNILP (we use the latest stable version 1.6.3).All exact learning algorithms and score calculations are coded in C++.Causal constraints algorithms including IAMB,interIAMBnPC,HITON_PC and MMPC are coded in Matlab [39] and we enter the results from these algorithms into the score calculation program in the form of files.IAMB,interIAMBnPC,HITON_PC and MMPC use the default significance level α=0.05. Empirical evaluation is performed on a computer.This computer runs Windows 10 with a 4-core Intel (TM) i7-7700H 2.6 GHz processor and has 16 RAM.
Since we are interested in the accuracy influence of these exact learning algorithms on different possible parent sets calculated by score calculations with different causal constraints,we do not directly represent scores(because their values change radically for different networks) but use a score error ratio to the optimal score:
The MDL(Go) means the MDL score of the DAG from the original exact learning algorithm on possible parent sets without any causal constraints and MDL(Gcc) means the MDL score of the DAG from the exact learning algorithm on possible parent sets with causal constraints.The better is the score obtained by an algorithm,the closer to 0 is its score error ratio.In our experimental results,we use“Error”to refer to the score error ratio.Also,“|PPS|”means the number of remaining possible parent sets for score calculations,and“Exp”means the number of expanded states in the order graph.
The conducted experiments consist of a group of benchmark BNs and some datasets from the UCI machine learning repository. Except for the Boerlage92 BN [40],the other benchmark BNs are publicly available in the BN repository.We randomly sample data from these benchmark BNs in different sample data sizes.
Table 1shows the running time for IAMB,interIAMBnPC,HITON_PC and MMPC and the number of remaining possible parent sets score calculations with Theorem 2−Theorem 4 and different causal constraints algorithms on benchmark BNs.Table 2−Table 7show the running time,the number of expanded states and the score error ratio for A*,AWA* and BFBnB on possible parent sets from correspondingTable 1.AndTable 8shows the running time and the score error ratio for GOBNILP on possible parent sets from correspondingTable 1.InTable 2,“O”means the original A* algorithm without any causal constraints.InTable 4,“O”means the original AWA* algorithm without any causal constraints.InTable 6,“O”means the original BFBnB algorithm without any causal constraints,and“NF”means no solution found.InTable 8,“O”means the original GOBNILP algorithm without any causal constraints.The number of expanded states reflects the efficiency of algorithms based on the order graph,so this indicator is only applicable to A*,AWA*and BFBnB. The scores obtained by the original algorithms are always optimal,that is,the score error ratio is always 0%,so we do not record them.We keep three decimal places for all results.Bold indicates the least time,the least number of remaining possible parent sets,the least number of expanded states or the least score error ratio among these different causal constraints algorithms.
Table 1 Running time for IAMB,interIAMBnPC,HITON_PC and MMPC and the number of remaining possible parent sets by score calculations with Theorems 2−4 and different causal constraints algorithms on benchmark BNs
Continued
Table 2 Running time and the number of expanded states for A* on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Continued
Table 3 Score error ratio for A* on possible parent sets with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Table 4 Running time and the number of expanded states for AWA* on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Table 5 Score error ratio for AWA* on possible parent sets with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Continued
Table 6 Running time and the number of expanded states for BFBnB on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Table 7 Score error ratio for BFBnB on possible parent sets with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Table 8 Running time and score error ratio for GOBNILP on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on benchmark BNs
Continued
According toTable 1,adding these causal constraints can effectively reduce the number of possible parent sets by score calculations.Among these algorithms,through the constraints of the IAMB algorithm,the least |PPS| and the least time are obtained on most networks.The MMPC algorithm has a better performance on boerlage92.InTable 2,Table 4,Table 6,andTable 8,with causal constraints on possible parent sets,these exact learning algorithms are faster and expand fewer states than the original algorithms without causal constraints.However,the score error ratios inTable 3,Table 5,Table 7,andTable 8indicate that these algorithms with causal constraints on possible parent sets cannot always obtain the optimal MDL score.InTable 6andTable 7about the BFBnB algorithm,the reason why“NF”occurs occasionally is that the BFBnB algorithm uses the classic approximate algorithm—the hill-climbing (HC) algorithm as the upper bound of its algorithm,and returns“NF”when the solution exceeds the upper bound.In other words,“NF”indicates that a worse score is obtained than the HC algorithm.Except for the particularity of the BFBnB,A*,AWA* and GOBNILP algorithms achieve the same score error ratio for the same BN under the same sample size and the same causal constraint on possible parent sets.This is because these algorithms are exact learning algorithms.For the same possible parent sets,these exact learning algorithms can always get the optimal score,that is,the same result.However,due to causal constraints,there is no guarantee that these possible parent sets themselves are optimal.In general,these exact learning algorithms on possible parent sets with causal constraints incur a slight loss of accuracy on BNs with small sample sizes and achieve satisfactory accuracy on BNs with large sample sizes. This is because IAMB,interIAMBnPC,HITON_PC and MMPC all use CI tests and require a large number of samples.They are sensitive to the accuracy of the CI tests,and if there are insufficient or noisy data,they may badly work [41].Hence,these causal constraints algorithms cannot offer accurate constraints without sufficient samples,resulting in inaccurate possible parent sets input to the subsequent exact learning algorithm.Our experimental phenomenon also coincides with this.Among these constraint algorithms,exact learning algorithms on possible parent sets with MMPC are faster,expand fewer states,and have lower score error ratios than those with other causal constraints algorithms in most cases.
We did similar experiments on UCI datasets.Table 9shows the running time for IAMB,interIAMBnPC,HITON_PC and MMPC and the number of remaining possible parent sets score calculations with Theorem 2−Theorem 4 and different causal constraints algorithms on UCI datasets.Table 10−Table 15show the running time,the Exp and the score error ratio for A*,AWA* and BFBnB on possible parent sets from correspondingTable 9.AndTable 16shows the running time and the score error ratio for GOBNILP on possible parent sets from correspondingTable 9.And we get similar experimental results in these tables.
According toTable 9,adding these causal constraints can effectively reduce the number of possible parent sets by score calculations.Among these algorithms,through the constraints of the IAMB algorithm,the least |PPS| and the least time are obtained on most networks.InTable 10,Table 12,Table 14,andTable 16,with causal constraints on possible parent sets,these exact learning algorithms are faster and expand fewer states than the original algorithms without causal constraints.However,the score error ratios inTable 11,Table 13,Table 15,andTable 16indicate that these algorithms with causal constraints on possible parent sets cannot always obtain the optimal MDL score.InTable 14andTable 15about the BFBnB algorithm,the reason why“NF”occasionally occurs is the same as why“NF”sometimes occurs on benchmark BNs.In general,these exact learning algorithms on possible parent sets with causal constraints incur a slight loss of accuracy on BNs with small sample sizes and achieve satisfactory accuracy on BNs with large sample sizes.The reason is the same as above.Namely,CI tests require a large number of samples.Under insufficient or noisy data,CI tests may return wrong constraints,resulting in inaccurate possible parent sets.
Table 9 Time for IAMB,interIAMBnPC,HITON_PC and MMPC and the number of remaining possible parent sets by score calculations with Theorems 2−4 and different causal constraints algorithms on UCI datasets
Table 10 Time and the number of expanded states for A* on possible parent sets without/with causal constraints from IAMB,interIAMBn-PC,HITON_PC and MMPC algorithm on UCI datasets
Table 11 Score error ratio for A* on possible parent sets with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on UCI datasets
Table 12 Time and the number of expanded states for AWA* on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on UCI datasets
Table 13 Score error ratio for AWA* on possible parent sets with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on UCI datasets
Table 14 Time and the number of expanded states for BFBnB on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on UCI datasets
Table 15 Score error ratio for BFBnB on possible parent sets with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on UCI datasets
Table 16 Time and score error ratio for GOBNILP on possible parent sets without/with causal constraints from IAMB,interIAMBnPC,HITON_PC and MMPC algorithm on standard BNs
Due to the small sample size of most of the UCI datasets used and the need for sufficient samples for CI tests,the experimental results on the UCI datasets are slightly worse than those on benchmark BNs. However,the trends of the two types of results are the same.Adding IAMB,interIAMBnPC,HITON_PC or MMPC can effectively reduce the number of possible parent sets by score calculations.With these causal constraints on possible parent sets,exact learning algorithms are faster and expand fewer states than the original algorithms without causal constraints.
We quantify the degree of improvement about causal constraints from benchmark BNs and UCI datasets.Using IAMB can prune 29.10%−98.61% (76.36% on average) possible parent sets,reduce 2.23%−98.52% (61.88%on average) running time and expand 5.74%−96.94%(49.66% on average) fewer states,while only losing 0%−25.42% (1.37% on average) accuracy.Using inter-IAMBnPC can prune 27.15%−98.62% (76.18% on average) possible parent sets,reduce 0.75%−98.99% (58.89%on average) running time and expand 4.67%−92.59%(49.01% on average) fewer states,while only losing 0%−25.48% (1.33% on average) accuracy. Using HITON_PC can prune 25.71%−96.54% (71.22% on average) possible parent sets,reduce 9.37%−98.79% (61.73%on average) running time and expand 1.52%−90.60%(51.18% on average) fewer states,while only losing 0%−28.17% (0.99% on average) accuracy.Using MMPC can prune 25.71%−96.52% (71.07% on average) possible parent sets,reduce 3.81%−98.92% (60.52% on average) running time and expand 1.64%−98.37% (54.41%on average) fewer states,while only losing 0%−28.09%(0.94% on average) accuracy.On average,algorithms using IAMB constraints can prune the most possible parent sets and reduce the most running time,while algorithms using MMPC can achieve the least loss of accuracy.Under causal constraints,exact learning algorithms can greatly improve efficiency with only a slight loss of accuracy,reducing time costs in practical applications.
By comparing the performance of A*,AWA*,BFBnB and GOBNILP on benchmark BNs and UCI datasets,we can see the following characteristics.Overall,the three algorithms A*,AWA* and BFBnB have the same trend in the experimental results because they are all a series of algorithms for heuristic search on the order graph.In most cases,the running time relation of these three algorithms is BFBnB In this work,we add causal constraints algorithms including IAMB,interIAMBnPC,HITON_PC and MMPC that can calculate the set of Markov blankets or parent and children sets of a BN into score calculations.The number of possible parent sets can be reduced by adding these causal constraints algorithms,improving the efficiency of exact learning algorithms.We conduct experiments on some state-of-the-art exact learning algorithms,including A*,AWA*,BFBnB and GOBNILP.Experimental results show that exact learning algorithms with causal constraints can prune about 70% possible parent sets and reduce about 60% running time,while only losing no more than 2% accuracy on average. For exact learning algorithms based on the order graph,adding causal constraints can expand about 50% fewer states in general.Generally,algorithms with IAMB constraints have advantages in reducing the number of possible parent sets and running time,while considering the accuracy and efficiency,algorithms with MMPC constraints have advantages in most datasets.Compared with the cost of accuracy loss,the improved efficiency of these algorithms is significant.In some areas,such as threat assessment,researchers need to obtain real-time BNs for application.Compared with the requirement of modeling accuracy,modeling efficiency is more important in these fields.Our work points out that the rapid modeling of the BN can be completed with a slight loss of accuracy,which is helpful for the rapid development of research in these areas. This research still has certain limitations. First,all causal constraints use a default significance level α=0.05.The research about how exact learning algorithms are affected by this parameter has not been studied in this work.Secondly,we only study the influence of causal constraints on four classic exact learning algorithms,and other exact learning algorithms should also be considered.Finally,this research only adds causal constraints into the score calculations,and other addition strategies should also be considered. In future work,we intend to research the influence of causal constraints on more exact learning algorithms,such as CPBayes,and study how these algorithms are affected by the significance level.We also want to add causal constraints into the search process of exact learning algorithms,not just the score calculations process.We believe that adding these constraints in the search process will increase efficiency even more. However,how to add causality to the search process requires further research in the future.5.Conclusions
Journal of Systems Engineering and Electronics2021年4期