Tao Fang,Dan Wu,*,Meng Wang,Jiaxin Chen
1 College of Communications Engineering,Army Engineering University of PLA,Nanjing 210007,China
2 College of Electronic and Information Engineering,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
Abstract:UAV-assisted D2D networks can provide auxiliary communication for areas with poor communication facilities by using the characteristics of easy deployment of unmanned aerial vehicle (UAV),then it becomes a promising technology.However,the coexistence of UAV and D2D aggravates the conflict of spectrum resources.In addition,when the UAV performs the communication service,it will inevitably cause the location change,which will make the original channel allocation no longer applicable.Inspired by the influence of frequent channel switching on channel allocation,we define the communication utility as a tradeoff between the throughput and channel switching cost.In the considered model,we investigate the multi-stage hierarchical spectrum access problem with maximizing aggregate communication utilities in UAV-assisted D2D networks.In particular,due to the hierarchical feature of the considered network,we adopt Stackelberg game to formulate this spectrum access problem where both the throughput and channel switching cost are considered.We prove that the proposed game has a stable Stackelberg equilibrium (SE),and the heterogeneous network based channel allocation(HN-CA)algorithm is proposed to achieve the desired solution.Simulation results verify the validity of the proposed game and show the effectiveness of the HN-CA algorithm.
Keywords:UAV;D2D;multi-stage heterogeneous spectrum access;stackelberg game
D2D technology is a promising approach to decrease delay and improve transmission rate [1–4].However,some practical factors usually limit data sharing among adjacent users through D2D communications:i) the signal fading is strongly affected by some special obstacles,e.g.,urban buildings.ii)The documents of interest among neighbors are different.Thus,there may exist some users who cannot form D2D pairs,which are called potential users for brevity.In this case,UAV can be applied to guarantee the necessary communication of potential users,especially in some disaster areas and poor suburbs without an available base station(BS)[5,6].
In such heterogeneous network,the challenges cannot be ignored.On the one hand,interference environment becomes more complex due to the limited available channel resources[7].When multiple D2D pairs in the lower layer choose the same channel to transmit,the co-layer interference happens[8].Moreover,when UAV in the upper layer occupies the same channel with a D2D pair in the lower layer,then cross-layer interference occurs.On the other hand,task-driven essence of UAV causes high mobility and dynamic topology[9–11],which breaks the applicability of the original channel allocation scheme.Due to the limited number of antennas,UAV can only serve one user at one time [5].And the task of UAV is usually set up to serve multiple users according to an optimized trajectory.Therefore,after serving a user,the UAV will change its current position to enter the next service stage.That is,UAV serves users one by one and the whole service process is a multi-stage process.Meanwhile,it is noted that the current channel allocation may not be suitable for the next moment because of the dynamic topology.Accordingly,frequent channel resource reset will lead to resource waste and delay,which is often ignored in most studies.Thus,to mitigate interference and avoid resource waste,it is desirable to design an efficient multi-stage channel allocation scheme,which involves not only the competition among multiple D2D pairs,but also the interaction among multiple layers.Specifically,when the number of D2D pairs increases,the strategy space grows exponentially.Facing high complexity and great calculations,how to design an efficient multi-stage channel allocation scheme is a challenging but meaningful research topic.
In recent years,UAV has attracted wide attention and achieved great success in many fields and applications,e.g.,weather forecast [12–14],epidemic prevention[15]and road traffic monitoring[16–18].Specifically,in the aspect of wireless communications,UAV mainly plays the role of air base station[19].The applications are mainly divided into two categories:i)reducing the traffic pressure of cellular network [20],e.g.,ensuring the communication needs of hot spots such as concerts.ii)providing necessary communication services,i.e.,UAV-assisted emergency networks,in disasters or rural areas when there is no available BS[21].
When UAV provides auxiliary communication for users in areas with imperfect communication facilities or network edge,UAV will share the same spectrum resources with terrestrial D2D pairs[22].The coupling of co-layer and cross-layer interference makes the communication environment complex.Consequently,the hierarchical network with the coexistence of UAV and D2Ds faces serious interference problems due to limited spectrum resources.
There are a few researches involving the interference management in heterogeneous networks from different perspectives.The research ideas can be divided into two categories:power control and channel allocation.
• Power control:In [23],the authors studied the power control optimization problem in D2D communications underlaying UAV-assisted access networks to achieve the throughput maximization.In [24],the authors combined the physical interference with social relationship and investigated the social group utility problem.Meanwhile,an optimal power allocation strategy was proposed based on a deep Q-learning in[25].In the considered model,UAV learned its optimal power allocation to resist the attacks from smart devices by reinforcement learning,and finally decreased the attack level from smart attackers.In[26],the authors innovatively applied the Stackelberg game in the physical layer security with one source and multiple relays.Specifically,the source and relays were modeled as the leader and followers,respectively.The leader coordinated the relays,i.e.,adopted the power of relays,to achieve the optimization of secrecy capacity.Simulation results verified the good performance of the proposed scheme with other existing schemes.
• Channel allocation:In [22],the channel assignment problem was studied in the combined UAV and D2D-based network.However,it ignored the influence of geographical and other environmental factors on D2D pairing.In [27],although the cha- nnel selection in dynamic UAV network topology was considered,it could not accurately characterize the impact of interference on throughput and ignored the hierarchical feature.In [28],the authors considered a similar hierarchical game framework in UAV-assisted IoT communication scenario.The complicated interplay between UA- Vs and BSs was analyzed by the Stackelberg game.Finally,the problem of the access selection of UAVs and bandwidth allocation was successfully solved.
However,the authors in[28]ignored the multi-stage spectrum access problem when UAV provided service.And most existing work is to study the resource allocation in this heterogeneous network from a static point of view,which did not consider the influence of frequent topology change.When the characteristics of UAV with high dynamics are considered [23],the location change of UAV will lead to the multi-stage process in channel allocation.
Therefore,the traditional interference management method in static scenario is no longer suitable for this heterogeneous network when considering topology state caused by the location change of UAV.On the one hand,the extra overheads,e.g.,energy consumption and transmission delay[29],are caused due to channel reconfiguration.On the other hand,the throughput may be decreased because of interrupted communication links[30].Even,it will cause the unreliability of service due to the packet loss.It is necessary to study multi-stage channel allocation in heterogeneous networks.Thus,the greatest challenge is how to allocate channel resource and avoid frequent channel switching in a hierarchical multi-stage service process.
At present,there are mainly two solutions to solve the interference problem.One is to transform the optimization problem into convex problem [23,24].It can find a suboptimal solution but cause high complexity.Especially when the number of network users increases,the complexity will increase exponentially,which will be a disaster for the central controller.More importantly,when considering the mobility of UAV in the mission process,the solution process will be more difficult for the centralized framework.The other one is to introduce game theory and construct a distributed framework[22].Each player decides its action and then the computation complexity will be greatly reduced.Finally,no one wants to deviate from its current decision.Therefore,in this paper,we focus on game theory to solve the tractable problem.
In this paper,we study the channel allocation in the multi-stage service of UAV by combining the dynamic characteristics of UAV and the interference management of heterogeneous networks.The change of UAV service location will change the communication environment,so the channel allocation in the previous stage will not be applicable.In order to measure the impact of switching channel,we define the communication utility,which is the tradeoff between throughput and switching cost.The contributions are presented as follows:
• We investigate multi-stage spectrum access problem in the heterogeneous network with the coexistence of UAV and D2D,which considers the negative influence of channel switching and coupling relationship of channel selection among stages.The goal is to optimize aggregate communication utilities by optimizing channel allocation in the multiple stages when UAV supports service for potential users.
• In order to analyze the interaction between UAV and terrestrial D2D,the problem is designed as a Stackelberg game,and it is proved that there is a stable Stackelberg equilibrium (SE).Meanwhile,we model the lower layer as a follower subgame,which is proved to be an exact potential game (EPG) and has at least one Nash equilibrium(NE).
• The heterogeneous network based channel allocation(HN-CA)algorithm is proposed to achieve the SE point where no players want to change their decision-making.Meanwhile,simulation results show the effectiveness of the formulated game and the proposed algorithm.
The rest of this paper is organized as follows.In Section II,we present the system model,game formulation and analysis of SE.The HN-CA algorithm is proposed in Section III.Simulation results reflecting the convergence and effectiveness of the proposed game is demonstrated in Section IV.Finally,we draw a conclusion in Section V.
As depicted in Figure1,we consider a UAV-assisted D2D network.Terrestrial users are distributed in areas with poor communication facilities,which is unable to provide normal wireless network services[22].To share the popular files such as hot movies without requiring from BS,D2D pairs can be formed among some users through similar preferences which is denoted byN={1,2,...,N}[31].However,there also exist some users which cannot form D2D pairs due to restrictions on geographical location and local files.For simplicity,letS={1,2,...,S}denote the remaining users,i.e.,potential users.In order to meet the demands of potential users,the rotary-wing UAVwis applied to provide service for potential users[32].Meanwhile,UAV equipped with a single antenna can only serve one user at a time[5].In the considered heterogeneous network,the UAV and multiple D2D pairs share the same spectrum resources.There areMavailable channels in the network,denoted byM={1,2,...,M},which satisfiesM Figure1.An example of considered heterogeneous network. Note that our considered model has good scalability.When studying the multi-UAV service scenario,potential users will be divided according to the region or task requirements,and UAV only serves users within a certain region.At this point,the problem will arise to task-based multi-region spectrum resource allocation.Meanwhile,there may exist the inter-UAV interference in a multi-UAV service scenario.Specifically,the mutual interference of UAVs will bring a great challenge base on our existing work.The new problem will become a hierarchical optimization problem with the multi-leader and multi-follower,which is difficult but also meaningful.However,it is beyond the focus in this paper.This interesting topic will be investigated in our future work. We assume that UAV can complete the service of a potential user in a time slot.Thus,the task execution are divided intoTslots orTstages(equal to the number of potential users),i.e.,T={t1,t2,...,T}.Letan,tandaw,tbe the channel selection of D2D pairnand UAVwin slott.On the one hand,when multiple D2D pairs occupy the same channel,co-layer interference will happen.On the other hand,the cross-layer interference may occur when the D2D and UAV share the same channel. Given slott,we suppose that the potential userstis served by UAVw.The throughput of userstis defined as: whereBis the system channel bandwidth,pwandpnare the power of UAVwand D2D pairn,N0is the background noise.Note thatgn,stis the channel power gain between userstand the transmitter of D2D pairnat slottwhich can be expressed aswhereα1is the path loss exponent of D2D links.δ(aw,t,an,t) is the Kronecker delta function,which is defined as: When considering UAV-to-ground link,the probabilistic path loss model is widely adopted.In such model,line-of-sight(LOS)and non-line-of-sight(NLOS) connections along with their probabilities of occurrence should be considered.Thus,the channel gain from UAVwto userst,i.e.,gw,st,can be expressed as[34]: wheredw,stis the distance between UAVwand userstat slott,α2is the path loss exponent of UAV-to-ground links,andηdenotes an additional attenuation factor when NLOS link exists.Moreover,the probability of LOS link is defined as: whereaandbare constant values depending on environment,andθindicates the elevation angle. Moreover,the throughput of D2D pairnin slottis defined as: wherepnis the transmit power of D2D pairn,andgndenotes the channel gain from its transmitter to receiver,i.e.,wherednt,nris the distance of transceivern.Note thatα1is larger thanα2due to the fact that the terrestrial environment is more complex than that in the air. In,tandIw,tindicate the co-layer interference and cross-layer interference respectively.Specifically,these two kinds of interference are defined as: whereJnis the neighbor set of D2D pairn,which can be defined as: wheredthrepresents the communication threshold.That is,when the D2D pairs occupy the same channel with its neighbor D2D pairs,the interference occurs. Specifically,given D2D pairn,when D2D pairjchooses the same channel to transmit the corresponding data,interference in the D2D networks happens,i.e.,the co-layer interference.The interference from D2D pairjcan be expressed aspjgj,nδ(an,t,aj,t),wherepnis the transmit power of D2D pairn,gj,nis the channel power gain from the transmitter of D2D pairjto the receiver of D2D pairn,δ(an,t,aj,t) is the Kronecker delta function like Eq.(2) to indicate whether the channel selection of D2D pairjis the same with D2D pairn.Due to the communication threshold [35],only when close neighbor D2D pairs occupy the same channel,the co-layer interference occurs.Thus,the aggregate co-layer interference of D2D pairnis expressed as Eq.(6). Similarly,given D2D pairn,when UAVwchooses the same channel to serve the potential users,interference from UAV happens,i.e.,the cross-layer interference.The interference from UAVwcan be expressed aspwgw,nδ(aw,t,an,t),wherepwis the transmit power of UAVw,gw,nis the channel power gain from UAVwto the receiver of D2D pairn.Considering the Kronecker delta function,i.e.,δ(aw,t,an,t),the cross-layer interference of D2D pairnis expressed as Eq.(7). Considering that UAVwserves different potential users in different locations,the channel allocation at the previous slot may be no longer applicable in the current slot.However,frequent channel switching will bring a certain burden to the system,e.g.,energy consumption and delay due to radio reconfiguration.To achieve the balance between throughput and channel switching cost,we apply communication efficiency to measure the utility.For both UAVwand D2D pairs,their communication utilities in slottcan be presented as follows: whereCis a constant value to measure the delay or energy consumption caused by channel switching.We call it channel switching cost.The cost can be used to characterize the service interruption,energy consumption or packet loss,etc.In other words,we can regard the channel switching cost as a generalized form of cost. The functionf(·) indicates whether the current channel selection is consistent with that in the previous slot,as: When UAV and D2D pairs choose a channel to transmit data in the first stage,their communication utilities areRstandRn,twithout channel switching,respectively.In the second and subsequent stages,their communication utilities decrease because of frequent channel change,i.e.,Rst −C · f(aw,t,aw,t−1),t ≥2 andRn,t −C ·f(an,t,an,t−1),t ≥2.For example,given the second staget=2,if D2D pairnchooses a different channel compared with the first stage,thus,D2D pairnwill experience a channel switching costC ·f(an,t,an,t−1)due to channel switching. We can find that the communication utility is no longer determined by a certain time slot,but is coupled with multiple time slots.In addition,the communication utility is related to the decision of other participants.It makes the combinatorial problem more difficult to solve.Taking these challenges into consideration,we combine game theory with the optimization problem. Stackelberg game is a kind of hierarchical noncooperative game where the leaders in the game have privileges over followers and move first.It is an excellent mathematical tool to characterize the mutual relationship,e.g.,the competition and cooperation,between the upper and lower levels.The upper and lower users will reach an equilibrium state in the constant interaction,that is,the stable solution of the optimization problem is obtained. In the two-layer communication network,both the potential users and D2D pairs need to choose the appropriate channels to avoid interference.The decisions of upper and lower level users will affect each other.Thus,Stackelberg game has an intuitive advantage in analyzing this problem.Meanwhile,UAVs usually store some information of users[24],e.g.,device model,location and content request,to provide users with services on demand.According to the fact,we assume that UAV has a higher priority. In this paper,UAV and D2D pairs are modeled as the leader and followers.Formally,the channel access hierarchical game can be denoted as follows: wherewandNrepresent UAV(leader)and D2D pairs(followers) which are the players in the game,AwandAnare the strategy space of UAV and D2Ds,respectively.The strategy set of D2DsAnis equal toAw={c1,...,cM} ⊗...⊗{c1,...,cM},where“⊗” represents the Cartesian product,andcmdenotes them-th channel from the available channel set.Specifically,aw=(aw,t1,aw,t2,...,aw,tS)and an=(an,t1,an,t2,...,an,tS)denote the action of UAV and D2D pairs in the game model,i.e.,the joint channel selection in all throughout the task execution stages.UwandUndenote the utility function of UAV and D2D pairs,respectively. From the perspective of UAV,it aims to maximize potential users’ throughput while avoiding frequent channel switching throughout the task execution process.When denoting the channel selection for UAVwin different time slots as aw=(aw,t1,aw,t2,...,aw,tS),the utility function of UAV is expressed as where a−wdenotes the channel selection profile of other players except UAVw,that is,all D2D pairs.Rstdenotes the throughput of potential userstin slottandC·f(aw,t,aw,t−1)represents the channel switching cost from the previous slott−1 to current slott.In other words,Eq.(13)is the aggregate communication utilities of UAV for all slots,i.e.,the accumulation of Eq.(9)for all slots. From the D2D pairs’ standpoint,the utility is defined as a tradeoff between its throughput and channel switching.Thus,the communication utility of each D2D pair for all slots is expressed as: where anis the channel selection of D2D pairnand a−ndenotes the channel selections of remaining players. Inspired by[35,36],we introduce the local altruistic game to characterize this utility function.In the local altruistic game,each D2D pair should consider the utility of itself and corresponding neighbors in search of greater benefits.Therefore,the utility function of an arbitrary D2D pair is defined as: UAVwaims to choose the most appropriate channel for serving users in different slots.Thus,the multistage optimization problem of UAV can be formulated as: Similarly,the optimization problem of D2D pairs can be denoted as: SE is a stable point in Stackelberg game in which no player is willing to change its actions unilaterally.In order to prove the existence of SE,we first give the following theorem to guarantee the existence of NE in the follower sub-game. Definition 1(Nash equilibrium(NE)[37]).The joint strategy profile,i.e.,a∗=(a∗1,a∗2,...,a∗N),of all D2D pairs in the lower layer is NE if and only if no player can improve its utility by deviating unilaterally: wherea−n is the action profile of game players except n,An denotes the strategy space of game player n. Definition 2(Exact Potential Game(EPG)[37]).The game is an exact potential game,if and only if there exists a potential functionΦfor ∀n ∈N such that: where A−n denotes the joint strategy profile of game players except n.It means that the utility change of an arbitrary player is equal to the change of the potential function. Theorem 1.Given a channel selection of UAV,the follower sub-game is an EPG with at least one pure strategy NE.The optimal solution constitutes the NE of the formulated game. Proof.Inspired by[38],we design a potential function as follows: Note that Eq.(20)has the practical physical meaning,which represents the aggregate communication utility of all D2D pairs throughout the task execution process. When an arbitrary D2D pair changes its action from anto a′n,the change in the potential function is expressed as: wherecn(a′n,a−n) denotes the changed communication utility of D2D pairn. We denote the set of remaining D2D pairs exceptnand its neighbors asLn=N(n ∪Jn) for a more concise expression.Therefore,Eq.(21)can be rewritten according to three parts as follows: The first term of Eq.(22)denotes the utility change of D2D pairn.The second term represents the utility change of its neighbors.Finally,the last term indicates the utility change of other D2D pairs except D2D pairnand its neighbors. Note that when D2D pairnchanges its channel selection,it only affects its neighbors because of interference distance threshold.Others inLnwill not be influenced by the unilateral deviation of D2D pairn.Thus,we have Next,we have Finally,we can draw this conclusion:the follower sub-game is an EPG.According to the properties of potential game[37],Theorem 1 can be proved. Then,we prove our constructed game is an EPG,which associates the individual optimization with global optimization with the aggregate communication utility of all D2D pairs throughout the task execution process.Thus,each time a certain player improves its value in utility function,then the global value will also increase.Finally,the optimal solution of global optimization is obtained. In order to analyze the existence of SE accurately in the proposed hierarchical game,the definition of SE is presented as follows: Definition 3(Stackelberg equilibrium [39]).The strategy profile(a∗w,a∗N)is called SE ifa∗w maximizes the leader’s utility anda∗N=(a∗1,...,a∗N)is the best response of followers’sub-game.Mathematically,the following conditions hold: whererepresents that all followers take the bestresponse strategies except for follower n. Theorem 2.There exists a SE in the proposed hierarchical game. Proof.When given an action of the leader aw,the follower sub-game is an EPG,which is proved in Theorem 1.Then the follower sub-game has at least one pure strategy NE,i.e.,(aw,a∗N). Regard all players of the follower sub-game as an independent individual,then the whole hierarchical game is a one-to-one non-cooperative game.Note that there is a mixed strategy solution for any noncooperative game[37].Finally,every player will reach a stable point.That is,the hierarchical game has a SE. As the hierarchical game model is formulated,the existence of the stable solution can be guaranteed.We will propose the HN-CA algorithm to achieve the SE point in this section. The HN-CA algorithm consists of Q-learning and stochastic learning automata (SLA).Q-learning,the representative of reinforcement learning,is applied to obtain the solution of the leader sub-game.When UAV takes action each time,Q-learning will update its Qtable according to the utility.The Q-table indicates the benefit value of each action in the current state,and has a certain guiding principle.Thus,UAV will adjust its actions towards the best direction due to experience in past historical decision-making. Moreover,SLA is a probability selection algorithm,which is usually used for channel assignment[40].It can dynamically change the selection probability of all actions according to the utility of selecting one action.For example,when one D2D pairnimprove its utility with its actionan,then SLA will increase the selection probability of this action and decrease the selection probability of other actions by D2D pairn. Note that the leader and followers update on different time dimensions [38].To begin with,we extend the actions from pure strategy to mixed strategy.For ease of description,we ignore the change of time slots in Algorithm 1.When time slots change,the players repeatedly execute this algorithm.The channel selection probability of UAV is defined asρw(e)={ρw,1(e),ρw,2(e),...,ρw,M(e)},whereρw,i(e) denotes the probability that UAV ? Initialization: Sete=0,k=0,and initialize the actions of UAV and D2D pairs,i.e.,ρw(0)andρn(0). Repeat Iterations: Step 1:In epoche,UAV chooses its channel according to its selection probabilityρw(e). Step 2:The decision-making process for D2D pairs is as follows: 1)In thek-th mini epoch,each D2D pair selects its channelan(k)according toρn(k). 2) Each D2D pair senses its utilityun(k) where related information of other players can be obtained through a common control channel. 3)D2D pairs update their channel selection probabilities by: whereβdenotes the learning step,umaxis utility of D2D pairnwithout interference. 4) If the state of D2D pairs is stable,go to Step 3.Otherwise,go back to Step 2. Step 3:The UAV decision-making process is as follows: 1)UAV senses its utilityuw(e). 2)UAV updatesQvalue: whereαdenotes the learning rate. 3) Then UAV updates its channel selection probability due to Boltzmann distribution: whereτ0is a temperature factor that balances exploration and exploitation. End Jump out of the loop if the utilities of both UAV and D2D pairs remain unchanged or maximum number of iterations is achieved. Figure2.Cross-layer interaction between the leader and followers. Figure3.The corresponding schematic of the SLA algorithm in the low layer. wchoosesi-th channel at epoche.Moreover,each D2D pair chooses the channel fromρn(k)={ρn,1(k),ρn,2(k),...,ρn,M(k)}at mini epochk.More details are presented in Algorithm 1. As shown in Figure2,in our proposed algorithm,UAV as a leader takes its action first,then multiple D2D pairs as followers make their decisions given the decision of the leader.When followers in the lower layer achieve the NE point,UAV should update its action to keep the optimal utility in the current state.The same iteration repeats until a stable SE is obtained.Also,it means that the problem P1 and P2 is coupled mutually. Note that there exists competition in the lower layer with multiple D2D pairs.To achieve a NE point in the follower sub-game,SLA algorithm is applied in the low layer.The corresponding schematic is presented in Figure3.First,each D2D pair selects its channelan(k) according toρn(k) at the beginning of thekth mini epoch.Then D2D pairs use the corresponding channel to transmit data,and sense their utilitiesun(k).Finally,D2D pairs update their channel selection probabilities according to their utilities. Here we discuss the operation of the proposed algorithm in a real communication system.Specifically,the key point in Algorithm 1 is how players sense their utilities,i.e.,2) in Step 2 and 1) in Step 3.Note that the interference or throughput of each player can be sensed by itself.The interference or throughput of others including in the utility can be obtained through exchanged information.For example,users work in the 802.11 DCF-like contention mechanism can exchange their control messages and obtain related information of others by a common control channel [30].After obtaining information on both itself and others,each player can sense its utility successfully. Before analyzing the asymptotic convergence of the algorithm,we first introduce two important lemmas. Lemma 1.When the learning step β →0,SLA applied in the follower converges to a pure NE point. Proof.It has been proved that when the following formula is satisfied,the multi-user SLA algorithm converges to the NE point[38,41].The formula is given by whereρ={ρ1,ρ2,...,ρN}denotes the mixed strategy profile of all D2D pairs,ρ−nis the mixed strategy profile of the followers except D2D pairn,H(m,ρ−n)denotes the function value whenm-th element ofρnis 1 and other elements ofρnis 0. hnm(ρ)represents the average utility function when them-th channel is selected by D2D pairn,while other D2D pairs access the channel using a mixed strategy,which is expressed as whereaidenotes a channel selection of D2D pairn. That is,if we can find a functionH(m,ρ−n) that satisfies Eq.(29),then we can prove the convergence of the SLA algorithm in the follower sub-game. We designH(m,ρ−n)=E[Φ(ρ)],i.e., Combined Eq.(19)and Eq.(31),we can deduce that Eq.(29)holds.That is,we find a functionH(m,ρ−n)satisfying Eq.(29).Therefore,Lemma 1 is proved. Lemma 2.Given the NE point achieved by SLA in the follower sub-game,the Q-learning algorithm converges to the optimal solution in the leader layer. Proof.For the detailed proof process,please refer to[38]. On the basis of Lemma 1-2,we prove the convergence of our proposed algorithm. Theorem 3.The HN-CA algorithm converges to a SE of the channel access hierarchical game. Proof.In Lemma 1,we prove that the SLA algorithm in lower layer with multiple D2D pairs converges to a pure NE point.Given the joint decisions of D2D pairs,UAV can decide its optimal decision according to the Q-learning algorithm,which is proved by Lemma 2.As shown in Figure2,when UAV updates its action,the lower layer with multiple D2D pairs as game players will form a new NE state.The same iteration repeats until a stable SE is obtained by the HN-CA algorithm.We assume that the final state is not a SE when our proposed algorithm terminates.However,if the final state is not a SE,it means that there exists at least one player who wants to deviate from its current decision.Thus,the proposed algorithm will continue to run until a stable point for all players is obtained.Therefore,the conclusion contradicts previous assumptions,and Theorem 3 is proved. Similar to [42],the computational complexity of the proposed algorithm is presented asNepoch[NminiO(F1)+[O(F2)+O(F3)]],whereF1,F2andF3denote a constant corresponding to Eq.(26),Eq.(27)and Eq.(28),respectively. Meanwhile,NepochandNminirepresent the number of the epoch and mini epoch when the proposed algorithm converges.Simulation results in Section IV reflect thatNepochandNminiis small,which indicates that the proposed algorithm has relatively low complexity. Table1.Simulation parameters. Figure4.The considered heterogeneous network with N=8 D2D pairs and S=4 potential users. Figure5.The convergence behavior of D2D pairs. Figure6.The convergence behavior of UAV. In this section,the performance of HN-CA algorithm is evaluated.As shown in Figure4,D2D pairs and potential users,i.e.,N=8 andS=4,are randomly distributed in a square area of 500×500m2.The black arrow in Figure4 represents the flight trajectory of UAV.The communication threshold is set todth=200m.There areM=3 available channels.UAV provides service for potential users in turn and the flight altitude of UAV is set toH=100m.The related environment parameters area=11.95 andb=0.136,respectively.The transmit power of UAV and D2D pairs arepw=5 W andpn=0.2 W,respectively.The detailed simulation parameters are presented in Table1. First,the convergence behaviors of D2D pairs and UAV are presented in Figure5 and Figure6,respectively.As shown in Figure5,we take D2D pair 2,3 and 4 as examples.It is noted that the SLA algorithm converges in about 230 iterations.At the end of iteration,each D2D pair choose the channel with a probability of one.For UAV,the same decision process is reflected in Figure6. Figure7.The impact of channel switching cost. Figure8.The aggregate utility of D2D versus the number of the channel. Figure9.The aggregate utility of D2D versus the power of D2D. Second,we show the results in the number of channel switching and throughput of D2D pairs when varying the channel switching cost in Figure7.The valueCrepresents the importance of channel switching,which is dependent on the application.For example,the system will only consider the interference whenC=0.The trend is that both the throughput and number of channel switching decrease with the increasing of channel switching cost.We can find that when the cost is 108,the number of channel switching decreases dramatically while the throughput is still maintained at a high level.Thus,we chooseC=108in the simulation. Next,we evaluate the performance of the proposed algorithm when comparing the random selection and D2D networks without UAV in Figure8.It is shown that the proposed algorithm is near the condition without the cross-layer interference but far greater than random selection. Finally,the comparison result of aggregate utility of D2D when varying the power of D2D from 0.2 W to 0.6 W is shown in Figure9.The figure reflects that the aggregate utility of D2D increases as the power of D2D rises.Meanwhile,our proposed algorithm has a better performance than the random selection.Besides,the performance of the random selection decreases when the power of D2Ds is from 0.5 W to 0.6 W.The interesting phenomenon is because that the bigger power brings a higher co-layer interference level due to the lack of reasonable channel allocation. This paper investigated multi-stage heterogeneous spectrum access problem in UAV-assisted D2D networks.To mitigate spectrum collision and avoid frequent channel switching,the problem was formulated as a channel access hierarchical game.Then we proved that the formulated game has a stable SE.Accordingly,the HN-CA algorithm was proposed to obtain the stable point.Finally,the effectiveness of the proposed game and HN-CA algorithm was verified by simulation results. ACKNOWLEDGEMENT This work is supported by the Jiangsu Provincial Natural Science Fund for Outstanding Young Scholars(No.BK20180028),the Natural Science Foundations of China(No.61671474),the Jiangsu Provincial Natural Science Fund for Excellent Young Scholars(No.BK20170089),and in part by Postgraduate Research and Practice Innovation Program of Jiangsu Province under No.KYCX190188.2.2 Stackelberg Game Formulation
2.3 Analysis of SE
III.HETEROGENEOUS NETWORK BASED CHANNEL ALLOCATION(HNCA)ALGORITHM
3.1 The Introduction of the Proposed Algorithm
3.2 Convergence Analysis
IV.SIMULATION RESULTS AND DISCUSSION
4.1 Parameter Setting
4.2 Convergence Behaviors
4.3 Performance Evaluation
V.CONCLUSION