Dynamic Multi-team Antagonistic Games Model with Incomplete Information and Its Application to Multi-UAV

2015-08-11 11:56:37WenzhongZhaJieChenandZhihongPeng

IEEE/CAA Journal of Automatica Sinica 2015年1期

Wenzhong Zha,Jie Chen,and Zhihong Peng

Wenzhong Zha,Jie Chen,and Zhihong Peng

—At present,the studies on multi-team antagonistic games(MTAGs)are still in the early stage,because this complicated problem involves not only incompleteness of information and conflict of interests,but also selection of antagonistic targets. Therefore,based on the previous researches,a new framework is proposed in this paper,which is dynamic multi-team antagonistic games with incomplete information(DMTAGII)model. For this model,the corresponding concept of perfect Bayesian Nash equilibrium(PBNE)is established and the existence of PBNE is also proved.Besides,an interactive iteration algorithm is introduced according to the idea of the best response for solving the equilibrium.Then,the scenario ofmultiple unmanned aerial vehicles(UAVs)against multiple military targets is studied to solve the problems of tactical decision making based on the DMTAGII model.In the process of modeling,the specific expressions of strategy,status and payoff functions of the games are considered,and the strategy is coded to match the structure of genetic algorithm so that the PBNE can be solved by combining the genetic algorithm and the interactive iteration algorithm. Finally,through the simulation the feasibility and effectiveness of the DMTAGII model are verified.Meanwhile,the calculated equilibrium strategies are also found to be realistic,which can provide certain references for improving the autonomous ability of UAV systems.

Index Terms—Dynamic multi-team antagonistic games(DMTAGs),incomplete information,perfect Bayesian Nash equilibrium(PBNE),multi-UAV cooperation,tactical decision making.

I.INTRODUCTION

T HERE are many scenarios of cooperation in reality,for example,in a football match 11 players cooperate with each other,striving to get the ball into the opponent's goal, in a military confrontation the combat units operate jointly to fight against the enemy targets.A common characteristic of these scenarios is that people cooperate or compete with others in teams.Team is a loose collection of multiple members or agents,and the members or agents have a certainobjective in common.Generally,the presence of a common objective forges a team and induces cooperative behavior. However,the team members might not be entirely altruistic. On the contrary,they might be selfish and have individual objectives.The additional individual objectives of these team members probably encourage them to opt for a weak degree of non-cooperation,mild competition,or adversarialaction[1]. When considering the interactive behaviors between teams, the existence of conflict of interests will cause an outright competition between the members of differentteams.So,how should the team members make the bestdecisions to maximize their common objective and individual objectives in the cases of internal cooperation and external competition?This is a game process,and we call it multi-team antagonistic games (MTAGs).

Atpresent,the game theory related to team is mainly represented by cooperative game[2−3]and evolutionary game[4].In cooperative game,coalition can be regarded as a small team where the players need to decide whetherornotto enter.There must be a binding agreement in the game to distribute the cooperation benefits.While in evolutionary game,population can be deemed to form a large team which contains a large number of small agents capable of making independent decisions.These small agents will take the strategic interaction repeatedly in the process of evolution(learning,imitating and mutating)to reach the equilibrium in population.Butthere are great distinctions between the above two theories on MTAG because of the non-cooperation and cooperation as wellas the number of members in MTAG.

In 1997,Stengel and Koller firstly investigated zero-sum game where a team of several players confronted a single adversary[5].It may be the embryo of MTAG.Then,Liu and Simaan[6]introduced convex static multi-team games and proposed an important concept of noninferior Nash strategy (NNS)which is Pareto optimal if the players belong to the same team and Nash optimalif they belong to differentteams. Thence,multi-team games began to attract more and more attention of researchers,such as Ahmed[7],Elettreby[8]and Asker[9].They generalized the multi-team games to Cournot game to study the dynamics and asymptotic stability of the equilibrium solution of the games.As a whole,we can find that the above research results were mainly about complete information game whose applications also only involved the problems of enterprise competition in economic field,since these problems had convex payoff which could conform with the assumption of NNS.However,the realistic situation is not so perfect because of the pervasive incomplete information[10]and nonconvex payofffunctions,thus the applications ofmulti-team games have to be limited.Taking these factors into account,we turn to establish a new game model which involves incomplete information,dynamic and antagonistic environment.We call it dynamic multi-team antagonistic games with incomplete information(DMTAGII).

As the scenarios described in the first paragraph,MTAGs can be applied in many realistic problems,especially the multi-agent systems.A typical example is the tactical decision making for multiple unmanned aerial vehicles(UAVs) against multiple military targets.In the last decade,the use of UAVs for various military missions has received an increasing attention.Compared to a single UAV,the air formation composed of multiple UAVs has more advantages.For example,it can accomplish a variety of military missions, attack the targets continuously and get more information about the threats and the battlefield situation[11−12].In such a case,an important question is how the UAVs make the best decisions autonomously and cooperate with others to confront the enemy collectively,especially when the enemy also has a certain intelligence.This is a complex problem because of coupling ofmissions and uncertainty ofbattlefield information. Generally,the decisions of multiple UAVs mainly include tactical decision[13−14]and maneuver decision[15].Tactical decision involves offensive ordefensive behaviorof UAVs and is carried out in discrete time,while maneuver decision is a continuous process as it refers to the mobile behavior such as pursuitor evasion.Forthe tacticaldecision,differing from the general dynamic game model with incomplete information or static game modelwith complete information used in[13−14], we will use the newly established DMTAGII model to solve the problems in this paper.

In DMTAGII,the main feature is the interactive behavior in a team,which comes from information sharing and coordination of the team members'strategies.Generally,the relationship between members within a team is cooperative. However,the conflict of interests will exist in the process of pursuing the maximization of different payoffs by the members simultaneously.This is obviously a non-cooperative or competitive scenario.The classical Nash non-cooperative game(NNG)can be considered to resolve the conflict.An important concept of NNG is the Nash equilibrium[16](a profile of strategies such thateach player's strategy is an optimal response to the other players'strategies).For NNG model, the strategy is pure(one certain type of strategy adopted by a player from its strategy space)or mixed(combination of pure strategies by probability distributing).It is well known that not all NNGs have pure strategy Nash equilibriums, while the strategy of DMTAGII is just pure,so we cannot directly use the concept of Nash equilibrium to process the aforementioned non-cooperative scenario.Instead,we turn to build an integrated modelby weighting the objective functions of team members to ensure the interests of some important members.Then we will introduce the perfect Bayesian Nash equilibrium(PBNE)for DMTAGII and prove the existence of PBNE.

To solve the game model,the general method is converting the game into the matter of linear programming[17−18],while it is invalid in the DMTAGII model as the payoffs of team members are uncertain(they can only observe the actions of other team members and choose the optimal strategies to antagonize)and the payoff matrix cannot be built.Thus,we propose an interactive iteration algorithm according to the idea of the best response.The above concepts are all organized in Section II of this paper.Then,in Section III,we build the tactical decision making model for multi-UAV against multiple military targets based on the DMTAGIImodel.Here, the specific expressions ofstrategy,status and payofffunctions of the team members are mainly discussed.Finally,in Section IV,we verify the feasibility and effectiveness of the models proposed in this paper by introducing a simulation example, and make the conclusions in Section V.

II.DMTAGII

A.Concepts of the Game

Withoutloss of generality,the DMTAGII of only two teams willbe discussed in this paper.The key concepts of DMTAGII are described as following.

1)Player.Suppose the players of the game are team members,where they can be denoted as X={x1,x2,···,xn}and Y={y1,y2,···,ym}.

2)Type.The sets of types of xiand yjareΘxi⊆ ΘX, i=1,2,···,n andΘyj⊆ ΘY,j=1,2,···,m.In the following context,we always take i,n and j,m as the subscripts of team X and Y,respectively.Θxiare known for every member,but at the k-th stage,the specific typeθkxiof xiis its private information which cannot be known by the members from other team.

3)Action.The sets of actions of xiand yjare Axi⊆AXand Ayj⊆AY.Axiis also well known information while the specifi c action akxiof xican only be observed by yjat the k-th stage.We take a combination of specific actions of every member yjas the observed information by member xi, aaak−X=[aky1,aky2,···akym],which is shared information in team X.

4)Strategy.It is different from the general dynamic game model with incomplete information,since the strategies of members(players)are related to their types,actions and the antagonistic targets(could be the members or the types of members on the opponent side)in DMTAGII.That is the main reason of choosing“antagonistic”rather than“noncooperative”to describe the games.Suppose the sets of strategies of xiand yjare Sxi⊆SXand Syj⊆SY,then the specific strategies of xiand yjare ssskxi=[θkxi,akxi,TY] and ssskyj=[θkyj,akyj,TX]at the k-th stage,where TYand TXare the antagonistic targets of team X and Y,respectively.

5)Status.At the k-th stage,every member can infer the current status of the game(such as the inventories and limits of manpower,resources,energy,etc.)according to the strategies of all the team members from stage 1 to stage k−1.We use Ekxi= E(sss1xi,sss2xi,···,sssk−1xi)and Ekyj=E(sss1yj,sss2yj,···,sssk−1yj)to denote the statuses of xiand yj.Accordingly,EkXand EkYare the statuses of team X and Y,Ek=£EkX,EkY⁄is the status of the game and E={E1,E2,···,EK}is the status of the whole game process,where K is the terminal node of the game.

6)Belief.Belief is the knowledge that one team knows about other teams,and it will be revised as the game progresses.The method to revise belief is Bayes rule.Without loss of generality,we take into account the belief of team X to team Y.Suppose the members of team X believe that the prior probability of the members of team Y belong to type combinationθ−X=[θy1,θy2,···,θym]is

Furthermore,from the knowledge of the members of team X,given the type combinationθ−Xof the members of team Y,then the conditional probability combination of their choosing action combination aaa−Xis

Then,according to Bayes rule,when the members of team X observe the action combination of the members of team Y is aaak−Xat the k-th stage,they will believe that the posterior probability combination of the members of team Y belonging toθ−Xis

The reason of choosing“−X”as the subscript of team Y in the above descriptions is that the belief is a subjective understanding of team X to team Y.

7)Payoff.In DMTAGII,due to the factthatteam members need to cooperate with others in the same team and confront with the members of other teams,their payoffs are necessarily affected by the members of all teams.The influences are mainly reflected by the strategies and statuses ofthe members, thus we can define the payoff functions of xiand yjare uxi(sssX,sssY,E)and uyj(sssX,sssY,E),where sssX∈SXand sssY∈SYare the sets of strategies of all members xiand yj,respectively.Accordingly,the payoff functions of team X and Y can be defined as uX(sssX,sssY,E)and uY(sssX,sssY,E), respectively.

8)Equilibrium.For DMTAGII,from the forms of general dynamic game with incomplete information[10],we define the equilibrium temporarily as follows(the normaldefinition will be introduced in Section II-C).

Definition 1.PBNE is a strategy combination sss∗= [sss∗X,sss∗Y]=[sss∗x1,···,sss∗xn,sss∗y1,···,sss∗ym]and a posterior probability combination[P(θ−X|aaak−X),P(θ−Y|aaak−Y)],which need to meet the following conditions.

1)Perfectness condition.For every team member and on every information set at the k-th stage,we have

2)Bayes rule.[P(θ−X|aaak−X),P(θ−Y|aaak−Y)]is obtained by using Bayes rule to infer based on the prior probability, observed actions[aaak−X,aaak−Y]and optimal strategies[sss∗Y,sss∗X].

Remark 1.The information set is the information about all the team members at the k-th stage,including certain information(such as actions,etc.)and belief obtained by Bayes rule.In every information set,when it comes to team X,the member ximust have the belief about the probability of the game coming to every node.

Remark 2.From the perfectness condition,given the belief, the strategies of team members must meet the“sequentially rational”requirement,i.e.,given the strategies ofthe members of team Y,sssY=[sssy1,sssy2,···,sssym],and the posterior probability combination P(θ−X|aaak−X)believed by the members of team X,then the strategy ofteam member xiis optimalon the continuation game from information set k.The“continuation game”is a complete plan about how to cope with all possible cases after the k-th stage.

Actually,the main similarities between the DMTAGII mode and generalmode aboutdynamic game with incomplete information are the incomplete information and dynamic property. Obviously,when DMTAGIIis a finite game and the number of selectable strategies is also finite,there is at least one PBNE in dynamic game with incomplete information under the DMTAGII mode.The proof can be consulted from Harsanyi's paper[10].

On the other hand,the main differences between the DMTAGII mode and general mode are that the specific strategies of the members are related to not only the types and actions, butalso the antagonistic targets on the opponentside.Besides, as the interactive relationship in the same team,members need to consider both individual interests and team interest.

The interactive behaviors make us to rethink the definition of PBNE.How do the team members choose the optimal strategies when there are multiple equilibrium solutions?Actually,there is a definite viewpoint in DMTAGII,which is the existence of team interest(or common objective).Without team interest,forming team is meaningless.We take formulas (5)and(6)as the objective functions of all team members:

Accordingly,the objective functions of team X and Y have the similar forms:

Thus,it is very meaningful to discuss the team interaction, because it refers to the coordination,between individual interests and team interest.

B.Interactive Behaviors in the Team

1)Understanding of interactive behaviors:Generally,in DMTAGII the interactive behaviors mainly come from two aspects.

Firstly,one aspect is information sharing,where the information contains the observed action combination and belief. This is communication and fusion about information,and we call it“soft-interaction”,because it impacts the objective functions of team members only by Bayes rule rather than the direct payoff functions.

Secondly,interactive behaviors also come from the coordination of team members'strategies to achieve the minimum overall cost and maximum income of the whole team.Correspondingly,we call it“hard-interaction”.In general,there are three different degrees of hard-interaction[1].

1)Team coordination.The main distinction of team coordination is that team members do not have individual objective functions.There is only one common objective,which allteam members strive to optimize.Thus,the team members maybe overutilize or underutilize the team resources.

2)Team cooperation.In team cooperation,each of the team members has a private objective in addition to the common objective.A generalmethod is weighting the private and common objective functions by coefficient w(0≤w≤1).The decrease of w on the private objective functions corresponds to the increase of cooperation levelamong the team members. There is a possibility of conflict of interests,but due to the structure of the objective functions used,the conflict is not generally dominant.Even if they are in conflict,the team objective can also take precedence according to its higher weight than that of private objective functions.

3)Team collaboration.Team collaboration is a loose form of team interaction,where the team focuses on task completion and feasibility.On this premise,each team member willtry to maximize its local objective function while avoiding conflict of interests and task redundancy.This requires that a protocol should be designed for negotiation to arbitrate conflicts.

Actually,the objective functions of the members from the same team are different in DMTAGII.Thus,when it is not possible or feasible that the structured joint action simultaneously optimizes the different objective functions,the conflict of interests is brought about.

2)Integrated model in cooperation:Now the question is how to define the objective functions for each team member such that the extremely selfish behavior can be avoided even if in the non-cooperative scenario?By comparing the three categories of interactive mode mentioned above,the one easier to implement and more realistic in DMTAGII is“team cooperation”.Thus,we redefine the objective functions of team members as follows:

where 0＜wxi,wyj＜1.

Then,we continue to discuss the cooperation within the team.When fixing the variables sssYand E,a straightforward idea is to find a set of strategies sss∗X=[sss∗x1,sss∗x2,···,sss∗xn] such thatfor any member xi,Hxi(sss∗X,sssY,E)is optimal.This is the concept of Nash equilibrium.Now the question comes again.Does the Nash equilibrium exist?If sss∗Xis a finite set of mixed strategies,then there is obviously at least one Nash equilibrium.However,unfortunately the discussed strategies in our DMTAGII may be pure and in the case that there may not exist a Nash equilibrium.Note that it is different from the preceding PBNE.In PBNE,the existence of equilibrium largely depends on Bayes rule,while within the team,Bayes rule has no effect,because the posterior probability or belief is the same for all members.

So it is necessary to change our original goal.If it is not possible to find a setof pure strategies such thatthe objective functions of allmembers within the team can achieve optimum simultaneously,then it is always feasible to optimize the weighted sum of these objective functions.It has the basis in reality thatwe mightnotconsider the interestof every member simultaneously but at least keep the interests of important members as priority by weighting.Thus,we have the definition as following.

Definition 2.Supposeρxiis the weight of the objective function of member xiin team X,where 0＜ρxi＜1 and(The assumption ensures that every member has the rightto participate in the distribution of team interest). Then we can define an objective function,called“integrated objective function”of team X,i.e.,

Similarly,the integrated objective function of team Y can also be defined as

C.PBNE

Let us come back to the equilibrium problem.Now,the perfectness condition in the preceding temporary PBNE has been changed and we have the following theorem.

Theorem 1.Suppose every team member has a finite number of strategies in DMTAGII,then there is at least one PBNE in the game.

Proof.Only the descriptions of the strategy and payoff function are differentbetween the DMTAGIImode and general mode aboutdynamic game with incomplete information,while the structure of game is unchanged.So we can take a team as a virtual player to integrate into the general structure,i.e.,

Then,the normal description of PBNE of DMTAGII is as follows.

PBNE of DMTAGII is a strategy combination sss∗= [sss∗X,sss∗Y]and a posterior probability combination[P(θ−X| aaak−X),P(θ−Y|aaak−Y)],which needs to meet the following conditions.

1)Perfectness condition.For every team and on every information set at the k-th stage,we have

2)Bayes rule.It is the same as described in Definition 1.

Obviously,it is a PBNE of two-player general dynamic game with incomplete information.From[10],we can know there must be a PBNE solution.So,it is easy to draw the conclusion that there is at least one PBNE solution in DMTAGII. □

Certainly,the integrated objective functions of team X and Y can also be expressed as

Actually,the conclusion of Theorem 1 is notvery satisfying for us,because it does not provide a simple method to solve the equilibrium.However,we can introduce the conceptof the best response to simplify the process of problem solving.

Definition 3.There is a fact in DMTAGII that for any strategy sssY∈SYputforward by team Y,team X can always choose the corresponding strategy sssX∈SXto antagonize team Y.Thus,there exists a mapping(usually a multi-valuedmapping)ϕ:SY→SX,such that for all sssX,we have

At this point,we call the set

as the best response of team X for a specific strategy chosen by team Y.Similarly,there exists a mappingφ:SX→SYsuch that the best response of team Y for a specific strategy chosen by team X is

The best response means when a team's action is known or can be predicted,then the other teams will take the strategy which can optimize their profits according to the known or predictable action,which is the best response to the former team.In fact,the concept of the best response also provides a convenience for finding the PBNE of DMTAGII,which is the following theorem.

Theorem 2.Suppose Z is the setof PBNEs of DMTAGII, then there must be Z=ZX∩ZY.

Proof.Suppose(sss∗X,sss∗Y)∈Z,then the following inequalities

are valid for each sssX∈SXand sssY∈SY.

According to Definition 3,(sss∗X,sss∗Y)∈ZXand(sss∗X,sss∗Y)∈ZY,i.e.,(sss∗X,sss∗Y)∈ZX∩ZY,thus Z⊆ZX∩ZY.

Here,every step of the above inference is equivalent,so the result ZX∩ZY⊆Z is also valid.Finally,Z=ZX∩ZY.

From Theorem 2,it is very convenient to solve PBNE directly,because we just need to solve the intersection of the above two best response sets.

Thus,the game tree formed by DMTAGII can be shown in Fig.1.

Atthe beginning stage of the game,the Nature willchoose the types for all members of a certain team.For example,in the left half of Fig.1,the members of team X will choose strategies fi rst.Then,the members of team Y will expect the types of team X based on the observed action combination and choose the optimal strategies to antagonize.After that, the members of team X will also perform the above process to choose the optimalstrategies.So itis done repeatedly until the end of the game. sssk,t

Fig.1.The game tree of DMATGII.

Xarbitrarily corresponding to the specified type,let k=k0, t=1.

D.Solution Algorithm

From the above dynamic process of DMTAGII,the payoffs of team members are uncertain at each stage and building the payoff matrix for allthe members is unrealistic obviously, so we cannot use a general method for converting the game into the matter of linear programming to solve.Considering the characteristics of PBNE of DMTAGII we will use the interactive iteration algorithm based on the idea of the best response to solve the optimal solution of the whole problem. The algorithm flow is as follows(assuming the game is started from the k0-th stage).

Step 1.The Nature chooses the types for the members of a certain team first(assumed to be team X).Given the terminal node K and a numerical precision∈≥0.

Step 2.The members of team X choose the initial strategy and obtain the optimal solution sssk,tY.Then,for this given strategy sssk,tY,use the optimization algorithm to solve the optimization problem(18)and obtain the optimal solution

Step 3.Firstly,for the given strategy sssk,tX,use an optimization algorithm to solve the optimization problem(19)

Step 4.If k＜K,let k=k+1,go to Step 3;otherwise, a setof strategy pairswillbe obtained.Go to Step 5.

Step 5.For the given sssk,tY,fix k and solve problem(18)to obtain the optimal solution of team X corresponding to the specified types.Then repeat Steps 3 and 4 to obtain another set of strategy pairs

Step 6.If the infinite matrix norm satisfies

for any k=k0,2,···,K,is the PBNE of DMTAGII on the k0-th information set;otherwise,let t=t+1 and go to Step 5.

Actually,the above algorithm flow can be described intuitively by Fig.2.

Fig.2.The solution algorithm flow of DMTAGII.

III.THE TACTICAL DECISION MAKING MODEL BASED ON DMTAGII FOR MULTI-UAV AGAINST MULTI-TARGET

The tactical decision making for multi-UAV against multitargetis a typicalMTAG process.As described in the introduction,the realistic problems in battlefield environment are very complicated.Our currentfocus is to explore the feasibility of using DMTAGII to solve the problems.Therefore,a simple antagonistic scenario will be considered in this paper.

A.Analyzing and Modeling

We take an air formation(Blue team)of multiple heterogenous UAVs against a small ground military base(Red team) as the actualbackground.The scenarios described as follows.

1)Blue team is formed by three unmanned combat aerial vehicles(UCAVs)and one unmanned reconnaissance aerial vehicle(URAV).Each UCAV loads some missiles to attack the Red military targets or intercept the incoming missiles, especially the missiles attacking URAV.While URAV is responsible for battlefield surveillance,target tracking and reporting the real-time battlefield information to UCAVs.The goal of Blue team is to ensure the safety of URAV and maximize the damage of Red team.

2)Red team is also formed by three ground missilelaunching positions(MLPs)and one operational command vehicle(OCV).Similarly,each MLP has some missiles which can attack the UCAV or URAV of Blue team as well as interceptthe coming missiles.The responsibility of OCV is to provide the battlefield information and send outthe operational command.The goal of Red team is to ensure the safety of OCV and maximize the damage of Blue team.

In this paper,we assume that the communication between members in the same team is perfect(if someone is killed, other members can still keep the information communication) and the fusion of multi-source information is effective.The members,types and the executable actions of two teams are shown in Table I.

Assume that each member can only choose one type and one action against one target in each game stage.When a UCAV attacks a MLP,the MLP can eitherinterceptthe missile or attack the platform of the UCAV.Corresponding to the DMTAGII model,the mathematical description of some key concepts for multi-UAV against multi-target is as follows.

TABLE I FEATURES OF Blue AND Red TEAMS

1)Type set.It is necessary to define the type set for each team.Suppose the type set of Blue team X is θX=[θx,1,···,θx,b,···,θx,7]T,whereθx,bis the combination of the team member and type.For example,θx,1= (x1,M issile1),θx,2= (x1,P latf orm1),···,θx,7= (x4,P latf orm2).Similarly,the type set of Red team Y is θY=[θy,1,···,θy,r,···,θy,7]T.

2)Strategy set.Firstly,letthe vector of type-action of Blue team be

Then,atthe k-th stage,the specific strategy of member xican be defined as a matrix,i.e.,

3)Status.The game status is mainly reflected on the inventories of missiles and the damage degrees of platforms. We also take Blue team as an example,to which Red team is similar.Suppose at the k-th stage,the inventory of missiles of member xiis Mkxiand the probability of being hit by a single missile in fire condition is P mxi.Meanwhile,suppose the damage degree of each member's platform is D pkxi.In case of no defense,the probability of platform being hit by a single missile is P pxi,then the damage degree after being hit is Ddxi.In case of defense,the damage effect of the platform willbe reduced toτxitimes of thatin the case of no defense.

Next,we consider the case thatthe types of team member xiare attacked by multiple missiles in one game stage.Suppose the number of these missiles isthen the probabilities of the types of member xibeing hit by these missiles are

Similarly,there are also changes in Ddxiandτxi,i.e.,

Thus,atthe(k+1)-th stage,the status transition of member xiis that

4)Payoff.Assume thatthe self payoff values of missile and platform of team member xiare V mxiand V pxi,then atthe k-th stage,the self payoff value of team member xiis

Whatever strategies xior yjadopts,the income of xican be expressed as

When Nkθyj,x=0,let

The term on the leftof“+”in(38)is the income of xifrom the platforms on the opponent side,where yjcorresponds to the subscript r in matrix Fkxi;while the term on the right of“+”is the income of xifrom the missiles on the opponent side,where the subscript b in matrix Fkyjcorresponds to xi.

Meanwhile,the cost of xiis

Thus,the total payoff of xiat the k-th stage can be expressed as

For the payoff of the whole Blue team,the focus of the team is the operational effect,so it can be defined as

Similarly,there are also such definitions in Red team which are not repeated here.

5)Belief.The expression of belief in the problem of multi-UAV against multi-target is very similar to the preceding DMTAGII model,so we do not repeat,either.As for the specific numericaldistribution,we willgive in the nextsection.

6)Constraint.In the realistic background,there are many constraint conditions,but here we just discuss some simple ones.Firstly,there are some explicit constraints for the team member xi:

As the coexistentrelationship between missile and platform in member xi,it can be considered that the platform has lost combatability if the damage degree of xi's platform is greater than or equalto a thresholdσxiatthe k-th stage.In this case, whether or not x'iplatform has remaining missiles,it must choose to exit the game.Meanwhile,the strategy at the next stage will be changed to Fk+1xi=0.

Besides,it is necessary to add a mandatory constraint in Blue team,D pkx4≤σx4,because of the importance of URAV; at every stage of the game,Blue team members must ensure the safety of URAV.Similarly,there is also a mandatory constraint in Red team,D pky4≤σy4.The mandatory constraints will appear in the objective functions in the form of penalty function.

7)Weight.Generally,the weightof each team member will be changed with the antagonistic situation.For simplicity,we assume the weight of each member is refl ected by its original self payoff value,

8)Objective function.To sum up,in the game model the control variables areThen the solution of game model is reduced to make the following optimization models be established simultaneously from the k-th stage to the terminal node K of the game.

Problem 1.

Problem 2.

9)Algorithm.It is obvious that for each stage,the above optimization problems belong to the nonlinear integer programming problems which we can use the genetic algorithm to solve.In the application of genetic algorithm,the most critical part is the chromosome coding.As matrix Fkxihas only one elementequalto 1 and others allequalto 0,we can set up a one-to-one mappingπwhich can map the subscripts (l,r)of the unique nonzero element to a natural number.The formula isπ(l,r)=6×l+r.This method contains the constraint P=1 and reduces the dimension of search space of strategies.Obviously,in Fkxi,members x1,x2and x3correspond to the firstfourrows while member x4corresponds to the last two rows.Then the strategies of x1,x2and x3can be mapped to the natural numbers from 1 to 28 while x4can be mapped to the naturalnumbers from 1 to 14.Thus,the application of genetic algorithm becomes very easy,which can be combined with the interactive iteration algorithm.

IV.SIMULATION AND ANALYSIS

A.Setting of Parameters

We setup the initialmilitary force distribution of Blue team and Red team as in Table II.

TABLE II INITIAL MILITARY FORCE DISTRIBUTION

For Blue team,we set the prior probability distribution of member xi(i=1,2,3)choosing missile or platform is{0.6,0.4}.As member x4only has one type(platform), its prior probability distribution is{1}.Similarly,for Red team,we set the prior probability distributions as y1～y3−{0.7,0.3}and y4−{1}.

Meanwhile,the conditional probabilities of team members choosing actions in the given types are provided in Table III.

TABLE III CONDITIONAL PROBABILITIES OF CHOOSING ACTIONS

Besides,we assume that the Nature chooses Blue team as the fi rst actor and has specified the types of the members of Blue team as x1−Missile1,x2−Platform1,x3−Missile1 and x4−Platform2.According to the proposed algorithm,we use Matlab to solve the constrained optimization Problems 1 and 2.

B.Results and Analysis

We get a set of optimal strategy pairs by the simulation from k0=1 to K=13,as shown in Table IV,where“MA”stands for missile attack,“MK”stands for missile keep,“PD”stands for platform defense,“PK”stands for platform keep,“M”stands for missile and“P”stands for platform.

Note thatthe above optimalinteractive iteration algorithm is only based on the k0-th information set,where the objective function value of each step reflects the expected payoff of team member.Really deciding the specific behaviors of Blue team and Red team is the first strategy pair(Fx1,t+1,Fy1,t+1). When the game goes on to the(k0+1)-th information set in reality,they need to recalculate the payoffbased on the current status to choose an equilibrium strategy pair.So to verify the feasibility and effectiveness of the model proposed in this paper,we setup some specific status foreach stage ofthe game as shown in Table IV,which gives the judgement of whether or not the team member performs strategy successfully.

With the optimal strategies,the payoff of every member is shown in Figs.3 and 4.

Fig.3.The payoff of Blue members.

TABLE IV THE OPTIMAL STRATEGY PAIRS FOR Blue AND Red TEAMS

Fig.4.The payoff of Red members.

From Table IV and Figs.3 and 4,it is clear that Blue team maintained the superiority;multiple UCAVs attacked the OCV and MLP continuously while URAV took defense timely.On the contrary,Red team was always in a passive situation whose operational strategies were mainly defending or intercepting the coming missiles.The main reason is that Blue team held the initiative in combatas the firstactorwhile Red team could only defend passively.The interesting thing is,before the 5th stage the 3 UCAVs deployed missiles to attack OCV or MLP each time,butfrom the 6th stage,they began to take turns to choose defense strategy,other UCAVs only attacked the OCV of Red team to contain the MLPs,thus Blue team preserved their power.When the game went on to the 11th stage,Red team had no missile but Blue team stillhad 6 missiles.Atthat time,Blue team began to attack the OCV with full force such that the damage degree of OCV rose sharply,which can be seen in Fig.4.

In addition,in the optimal strategies,there are some strong cooperations between the team members.For example,in the strategy series of Blue team,3 UAVs first focused fire to attack and caused the Red members(especially the aggressive members)struggling to cope with,and then they began to take turns to rest,and preserved the power to prepare for the final assault.Obviously,this kind of strategy could not be performed without cooperation.Similarly,the cooperation in Red team was also obvious.When Blue team attacked the OCV,MLPs did not attack the UCAVs or URAV but chose to interceptthe missiles as OCV was very importantin Red team. Of course,they would also balance the individual objectives and team objective.For example,at the 9th stage,they turned to attack the platforms of Blue team members.But it was too late because they only could attack twice due to the limited number of their missiles.Besides,the cooperation can also be expressed according to the Figs.3 and 4,where the changes of the payoffs of team members with the same characteristics are very consistent,especially Red team members.

Furthermore,the payoff of the whole team is shown in Fig.5.Actually,the payoff of team is the differences of total value of all members between the two teams.From Fig.5,we can see that the payoff of Blue team increased as the game progresses,while that of Red team decreased.Corresponding to the strategies of Red team members,we can find that when the payoff of Blue team was greater than that of Red team(after the 8th stage),the Red team members began to counterattack rather than to keep defense all the time.It is very coincident with the actual scenes.

Finally,the statistic data ofantagonistic results of Blue teamand Red team is shown in Table V.

From Table V,it can be seen that the dominance of Blue team is very obvious.The greatest cost of platform is only 100(10%)in Blue team,while the one in Red team is 3 477 (nearly 70%).Besides,the equilibrium strategy means that no matter what strategy Red team adopts,Blue team can always generate the corresponding strategy to obtain the netearnings no less than 4 177.Of course,the main reason is that Blue team is the first actor.If Red team is the first one,then this equilibrium will be changed.

To sum up,we find that the DMTAGII model proposed in this paper is very effective to solve the problems of tacticaldecision making for multi-UAV againstmulti-target.Meanwhile, the optimal equilibrium strategy pair of Blue team and Red team is also consistent with the realistic battlefield scenarios. Besides,if the multi-UAV learn the equilibrium,they can also take some deceptive actions to entice the opponent team generate inferior equilibrium strategy,thus improve their own income.

Fig.5.The payoff of Blue and Red teams.

TABLE V STATISTICS OF ANTAGONISTIC RESULTS

V.CONCLUSION

In reality,the antagonism between multiple teams is a very common scenario,of which a basic characteristic is that the members in a team cooperate with each other to antagonize other teams jointly.So it is also a game process.At present, the studies on MTAGs are still in the early stage,because this complicated problem involves notonly incompleteness of information and conflict of interests(including the internal interest and external interest of the team),but also selections of antagonistic targets(that is a multi-objective assignment problem).Therefore,based on the previous researches,a new framework is proposed in this paper,which is the DMTAGII model.For this model,the corresponding conceptof PBNE is established and the existence of PBNE is also proved.Besides, an interactive iteration algorithm is introduced according to the idea of the best response for solving this equilibrium.

At the same time,the applications of MTAGs are very extensive,especially in the multi-agent systems.Therefore, a simulation of multi-UAV against multi-target is studied to verify the feasibility and effectiveness of using the DMTAGII model to solve the problems of tactical decision making.In the process of application,the specific expressions of strategy, status and payoff of the game are considered,and the strategy is coded to match the structure of genetic algorithm so that PBNE can be solved by combining the genetic algorithm and the interactive iteration algorithm.

Finally,through the simulation,it can be seen that the DMTAGII model is very suitable for solving the tactical decision making problems for multi-UAV againstmulti-target. Meanwhile,the calculated equilibrium strategies are also feasible and realistic,which can provide certain reference for improving the autonomous ability of UAV systems.Surely,the current work is just a beginning.Considering the complexity ofdynamic multi-team game,we stillhave a lotofwork to do in the future such as multi-team cooperative game,multi-team differential game and so on.

REFERENCES

[1]Rasmussen S J,Shima T.UAV Cooperative Decision and Control: Challengesand PracticalApproaches.Society for Industrialand Applied Mathematics,2009.15−19

[2]Bardhan R,Ghose D.Resource allocation and coalition formation for UAVs:a cooperative game approach.In:Proceedings of the 22nd International Conference on Control Applications.Hyderabad,India: IEEE,2013.1200−1205

[3]Semsar-Kazerooni E,Khorasani K.Multi-agent team cooperation:a game theory approach.Automatica,2009,45(10):2205−2213

[4]Sandholm W H.Population Games and Evolutionary Dynamics.Massachusetts:MIT Press,2011.1−15

[5]von StengelB,Koller D.Team-maxmin equilibria.GamesandEconomic Behavior,1997,21(1−2):309−321

[6]Liu Y,Simaan M A.Noninferior Nash strategies for multi-team systems. Journal of Optimization Theoryand Applications,2004,12(1):29−51

[7]Ahmed E,Hegazi A S,Elettreby M F,Asker S S.On multi-team games. Physica A,2006,369(2):809−816

[8]Elettreby MF,Hassan S Z.Dynamicalmulti-team Cournotgame.Chaos, Solitons and Fractals,2006,27(3):666−672

[9]Asker S S.On dynamicalmulti-team Cournotgame in exploitation of a renewable resource.Chaos,SolitonsandFractals,2007,32(1):264−268

[10]Harsanyi J C.Game with incomplete information played by bayesian players part III:the basic probability distribution of the game.ManagementScience,1968,14(7):486−502

[11]Chen J,Zha W Z,Peng Z H,Zhang J.Cooperative area reconnaissance formulti-UAV in dynamic environment.In:Proceedings ofthe 9th Asian Control Conference.Istanbul,Turkey:IEEE,2013.1−6

[12]Zhao Ming,Su Xiao-Hong,Ma Pei-Jun,Zhao Ling-Ling.A unified modeling method of UAVs cooperative target assignment by complex multi-constraint conditions.Acta Automatica Sinica,2012,38(12): 2038−2048(in Chinese)

[13]Hui Yi-Nan,Zhu Hua-Yong,Shen Lin-Cheng.Study on dynamic game method with incomplete information in UAV attack-defends campaign. Ordnance Industry Automation,2009,28(1):4−7(in Chinese)

[14]Chen Xia,Liu Min,Hu Yong-Xin.Study on UAV offensive defensive game strategy based on uncertain information.ActaArmamentarii,2012, 33(12):1510−1514(in Chinese)

[15]Emre K,Gokhan I.Exploiting delayed and imperfect information for generating approximate UAV target interception strategy.Journal of Intelligentand Robotic Systems,2013,69(1−4):313−329

[16]Bhattacharya S,Basar T.Differential game-theoretic approach to a spatial jamming problem.Advances in Dynamic Games,2013,12: 245−268

[17]Herbert Gintis.Game Theory Evolving:A Problem-Centered Introductionto Modeling StrategicInteraction.New Jersey:Princeton University Press,2008.41−45

[18]Mei S W,Zhu J Q.Mathematical and control scientific issues of smart grid and its prospects.Acta Automatica Sinica,2013,39(2):119−131

Wenzhong Zha Ph.D.candidate at the School of Automation,Beijing Institute of Technology.He received his bachelor degree from Beijing Institute of Technology in 2008.His research interests include dynamic game theory,multi-objective optimization and decision making and incomplete information processing.

Jie Chen Professor at the School of Automation, Beijing Institute of Technology.His research interests include complex systems multi-objective optimization and decision,constrained nonlinear control, and optimization methods.Corresponding author of this paper.

Zhihong Peng Professor at the School of Automation,Beijing Institute of Technology.She received her Ph.D.degree from Central South University in 2000.From December 2000 to February 2003, she was a postdoctoral research associate in Beijing Institute of Technology.Her current research interests include intelligent information processing, multi-agent cooperation,optimization and decision.

October 10,2013;accepted May 15,2014.This work was supported by Foundation for Innovative Research Groups of National Natural Science Foundation of China(NSFC)(61321002),National Science Fund for Distinguished Young Scholars(60925011),Projects of Major International(Regional)Joint Research Program NSFC(61120106010),Beijing Education Committee Cooperation Building Foundation Project,Program for Changjiang Scholars and Innovative Research Team in University(IRT1208), Chang Jiang Scholars Program and National Natural Science Foundation of China(61203078).Recommended by Associate Editor Changyin Sun

:Wenzhong Zha,Jie Chen,Zhihong Peng.Dynamic multi-team antagonistic games modelwith incomplete information and its application to multi-UAV.IEEE/CAA Journal of AutomaticaSinica,2015,2(1):74−84

Wenzhong Zha,Jie Chen,and Zhihong Peng are with the School of Automation,Beijing Institute of Technology,Beijing 100081,China and State Key Laboratory of Intelligent Control and Decision of Complex Systems, Beijing 100081,China(e-mail:wenzhong@bit.edu.cn;chenjie@bit.edu.cn; peng@bit.edu.cn).

IEEE/CAA Journal of Automatica Sinica2015年1期

IEEE/CAA Journal of Automatica Sinica的其它文章: Probabilistic Robust Linear Parameter-varying Control of a Small Helicopter Using Iterative Scenario Approach; Decoupling Trajectory Tracking for Gliding Reentry Vehicles; Autonomous Landing of Small Unmanned Aerial Rotorcraft Based on Monocular Vision in GPS-denied Area; Guest Editorial for Special Issue on Autonomous Control of Unmanned Aerial Vehicles; A Predator-prey Particle Swarm Optimization Approach to Multiple UCAV Air Combat Modeled by Dynamic Game Theory; Adaptive Backstepping Tracking Control of a 6-DOF Unmanned Helicopter