Xiaobo Zhang,Hai Wang,Yifan Xu,Zhibin Feng,Yunpeng Zhang
College of Communications Engineering,Army Engineering University of PLA,Nanjing 210007,China
Abstract:This paper mainly investigates the coordinated anti-jamming channel access problems in multiuser scenarios where there exists a tracking jammer who senses the spectrum and traces the channel with maximal receiving power.To cope with the challenges brought by the tracking jammer,a multi-leader onefollower anti-jamming Stackelberg(MOAS)game is formulated,which is able to model the complex interactions between users and the tracking jammer.In the proposed game,users act as leaders,chose their channel access strategies and transmit firstly.The tracking jammer acts as the follower,whose objective is to find the optimal jamming strategy at each time slot.Besides,the existence of Stackelberg equilibriums(SEs)is proved,which means users reach Nash Equilibriums(NEs)for each jamming strategy while the jammer finds its best response jamming strategy for the current network access case.An active attraction based anti-jamming channel access(3ACA)algorithm is designed to reach SEs,where jammed users keep their channel access strategies unchanged to create access chances for other users.To enhance the fairness of the system,users will adjust their strategies and relearn after certain time slots to provide access chances for those users who sacrifice themselves to attract the tracking jammer.
Keywords:coordinated anti-jamming;channel access;Stackelberg game;tracking jammer;active attraction
The rapid development of wireless communications leads to a serious shortage of spectrum resources,and brings great challenges in frequency selection and coordination for communication devices(users).In addition,with the application of software radio technology,the production cost of malicious jamming equipment is getting lower and lower,and the flexibility of these types of equipment becomes higher and higher,which makes the communication equipment face more serious threaten brought by malicious jamming attacks.Therefore,how to realize intelligent anti-jamming channel access decision-making in the complex spectrum environment is one of the key problems that remain to be solved in wireless communication networks[1–5].
Focusing on the anti-jamming decision-making problem,the combination of game theory[6–15]and distributed learning[16–22]is an effective way to formulate the complex interactions and obtain the optimal channel access strategies.Furthermore,considering the obvious sequence of the actions,Stackelberg game[23–33]is especially suitable to model and analyze the interactions between the user-side and the jammer-side.For example,authors in[23]considered the influence of incomplete information,and formulated the power control anti-jamming problem as a Bayesian Stackelberg game.In[24],the authors extended the anti-jamming power control problem from the continuous decision domain to the discrete decision domain,and proposed a hierarchical Q-learning algorithm to obtain the mixed SE of the game.Authors in[25]proposed an anti-jamming Stackelberg game that contains one user and one jammer,and investigated the influence of observation error.While authors in[26]proposed a jamming countermeasure Stackelberg game to model the jamming power control problem,where a separate jammer was deployed to ensure the communication security of the legal user.To help the legal user,the jammer acted as a leader to sense and interfere the illegal user,while the illegal user acted as a follower of the game.
While considered the multi-user anti-jamming scenario,authors in[27]formulated the multi-user anti-jamming problem as a one-leader multi-follower Bayesian Stackelberg game,where the observation error and the mobility of unmanned aerial vehicles(UAVs)were taken into account.In addition,in[28],a cooperative anti-jamming Stackelberg game was proposed,where the competitive interactions were extended to three tiers,the user,the relay and the jammer.Here,the user acted as the leader,the relay was the sub-leader,while the jammer acted as the follower of the game.
However,there are some challenges to extend exiting game-theoretic anti-jamming approaches to the multi-user coordinated anti-jamming channel access problems against the tracking jammer.(i)Traditional game-based multi-user anti-jamming assumed that information exchange was necessary to coordinate the conflicts among users,however,the information exchange was limited under the malicious jamming condition.(ii)The equilibrium solutions obtained in most existing game-based approaches were static,thus these equilibriums were unfair for network users.(iii)The characteristic of the tracking jamming was rarely considered,hence a specific confrontation mechanism is needed to fight against the tracking jammer.
To cope with the challenges mentioned above,a multi-leader one-follower anti-jamming Stackelberg game is formulated to model the complex interactions between users and the tracking jammer.In addition,an active attraction based anti-jamming channel access algorithm is designed to confront the tracking jammer.The main contributions of this paper are summarized as follows:
·A multi-leader one-follower anti-jamming Stackelberg game is formulated,where users act as leaders,chose their channel access strategies and transmit firstly.The tracking jammer acts as the follower,whose objective is to find the optimal jamming strategy at each time slot.
·The existence of the Stackelberg equilibrium is proved,which is consisted of the Nash equilibrium of the leader sub-game and the best response of the follower.
·To reach the SE of the game,an active attraction based anti-jamming channel access(3ACA)algorithm is designed.Where users being jammed keep their channel access strategies unchanged to attract the jammer,and create access chances for other users.
·To enhance the fairness of the system,users will adjust their strategies and relearn after certain time slots to provide access chances for those users who sacrifice themselves to attract the tracking jammer.
·Simulations show that the proposed approach coordinates the internal spectrum conflicts among users and avoids the external tracking jamming without information exchange.Besides,it also guarantees the fairness of the multi-user IoT networks via dynamic rotations.
Note that several existing works have studied the anti-jamming scenarios[16,27,29],the main differences between this paper and the comparative works are:In[16],a coordination-learning based dynamic spectrum access method was presented,which is suitable for the ruleless dynamic jamming attacks.However,work[16]did not adapt to the tracking jammer as the observation of jamming signals were not accurate.In addition,work[27]proposed a multiuser one-jammer Bayesian Stackelberg game as mentioned above,but this work was also not applicable for the tracking jammer scenario as the tracking jammer would attack users’transmissions after observing the activities of them.In[29],a learning-based spectrum access Stackelberg game was formulated from the perspective of the offensive side,while this paper investigates the anti-jamming scenario from the defensive side where there exists a tracking jammer,and an active attraction mechanism is designed to cope with the tracking characteristic.
The rest of this paper is organized as follows.The coordinated anti-jamming system model and problem formulation are presented in Section II.In Section III,the multi-leader one-follower anti-jamming Stackelberg game is formulated.While in Section IV,an ac-tive attraction based anti-jamming channel access algorithm is designed.In Section V,simulation results are presented.Finally,the conclusions are made in Section VI.
In this paper,a multi-user coordinated transmission scenario is considered,where there exist K users and C available channels.Each user is denoted as a transmission-receiver pair,and all users are transmitting all the time.As shown in Figure.1,there also exists a malicious tracking jammer,who will send jamming signals to one of the available channels once sensing the transmission activity of users.Besides,each user possesses the capability of switching between wide-band spectrum sensing and transmission.Denote the user set as K={1,...,K},and the available channel set as C={1,...,C},the condition that K≫C is considered as the channel resources are scarce in general.
It is assumed that the time is slotted,and one transmission process is implemented within one time slot.Denote the slot set as T={1,2,...,T},and each user k chooses one channel from channel set C to transmit,where k∈K.In addition,denote the channel access strategy of user k as sk(t),thus sk(t)∈C∪{0},where 0 represents that user k keeps silence and does not transmit at slot t.When users transmit their data,the jammer will choose one of the channel from set C,which can be denoted as sj(t).
As there exist multiple transmissions simultaneously,it will lead to serious co-channel interference among users who transmit in the same channel.Thus,it is assumed that if multiple users use the same channel to send data,none of them can transmit successfully.
Furthermore,if the jammer choose one channel to send jamming signal,then users choosing the same channel can not complete their transmission either.Here,the conflict indicator function ℓ(sk(t))is used to reflect the jamming or interference condition of channel sk(t)∈C,which is expressed as:
where s-k(t)denotes the other users’channel selection strategies except user k.Hence,ℓ(sk(t))=1 means that the transmission of user k is not influenced by other users or the jammer.On the contrary,ℓ(sk(t))=0 depicts that there exists co-channel interference or malicious jamming which cause the failure of the transmission of user k.
Assume that Pkis the transmission power of user k,Bsk(t)is the bandwidth of channel sk(t),and dkis the link distance for the transmitter-receiver pair of user n.In addition,the pass-loss factor is denoted as β,and the noise power is denoted as Nsk(t).Hence,the transmission rate of user k is expressed as:
Thus,the total network transmission rate which denotes the sum of all users’transmission rate,is shown as follows:
For users,the optimization objective is to maximize the total network transmission rate,and obtain the optimal channel access strategies to avoid both malicious jamming and co-channel interference,that is:
While for the tracking jammer,the purpose is to find its optimal jamming channel at each time slot,that is:
The above equation means that the utility of jammer is the cumulative received signal power on the jamming channel c.Thus,the optimal jamming strategy is expressed as:
where εcis the users’set that chooses channel c,dmjis the distance between transmitter m and the jammer.
Note that there is no centralized controller in the multi-user network,thus the optimization problem P is hard to be solved by centralized optimization approaches.Furthermore,the tracking characteristic of the jammer aggravates the difficulty in finding optimal channel access strategies for users.
Considering the competitive relationships between multiple users and the tracking jammer,the Stackelberg game is a feasible tool to analyze and formulate the anti-jamming problem.specifically,when there exist internal conflicts among users,the Stackelberg game can be extended to a multi-leader one-follower anti-jamming Stackelberg(MOAS)game.
Here,the proposed MOAS game is expressed as:
where K and J represent the users’set and the jammer respectively,and C is the available channel set as mentioned in Section II.Skand Sjare the strategy sets of the users and the jammer,where sk∈Skis one of the channel access strategies of user k,and sj∈Sjis one of the jamming strategies.Moreover,Gkis the utility function of user k,which represents the transmission rate of user k,while Gjis the utility function of the jammer,which is the cumulative received signal power on the current jamming channel.
Figure 2.The structure of the proposed MOAS game.
The structure of the proposed MOAS game is shown in Figure.2.Users are assumed to be leaders of the game,and each user chooses one channel to transmit at the start of each time slot.After the completion of the transmission,each user updates its channel access strategy at the end of the slot.The purpose of leaders is to reach the Nash equilibrium to enhance the total network transmission rate.On the contrary,the jammer acts as the follower of the game,and try to find the best-response jamming strategy that obtains the maximal utility.Besides,if there is no transmission activity,the jammer will keep silent to save its energy.
Before solving the coordinated anti-jamming channel access problem,the definition of the Stackelberg equilibrium(SE)is given first.Then,the existence of the SE is analyzed.Finally,a coordinated antijamming mechanism is designed to obtain the SE under the tracking jamming environment.Generally,the definition of the SE is given as follows:
According to the sequence decision-making characteristic of the proposed MOAS game,the optimization objective of the network is rewritten as:
The adjusted optimization objective is able to illustrate the operation process of the MOAS game.First,users take actions and try to find the optimal transmission strategies to maximize their utilities distributedly.Then,after observing the actions of users,the jammer chooses its best response jamming strategy according to its jamming rule.
Note that there exist different activities for users and the jammer,the proposed MOAS game can be decomposed into two sub-games,i.e.,leaders sub-game and the follower sub-game[34,35].
In the MOAS game,users act as leaders and take channel access strategies first.Besides,leaders in the MOAS game possesses first mover advantage.Generally,we denote the leaders sub-game as Gl=?K,C{Sk}k∈K,{Gk}k∈K?.In this sub-game,there exist at least one NE no matter which channel the jammer attacks.Following are the details of the theorem and proof.
Theorem 1.For a specific jamming response strategy sj,there exists at least one NE((s*1,...,s*K))in the leaders sub-game,which satisfies the following condition:
In detail,the first line of(11)means that each user can select one channel to access or keep silent(when strategy “0”is selected).The second line means that when the system reaches NE,users who are not silent choose different channels to access.The third line indicates that all channels are fully occupied,and there will be no idle channel anymore for the NE state.While the last line shows that users can avoid jamming attacks when reaching NE.In a word,the above equations mean that some of the users can occupy a channel exclusively and avoid the malicious jamming signal simultaneously.Besides,the other users who do not occupy any channel will keep silent.
According to Theorem 1,no matter which channel the jammer is attacking, there exists at least one NE for the leaders sub-game[36].In addition,the throughput of the network will maintain a high-level state as the available channel resource blocks are fully utilized.In the later section,the operation process of obtaining the NE of the leaders sub-game will be shown.
Known by the tracking characteristic of the jammer,the follower is able to find the best response jamming strategy correspondingly no matter what actions users are taking.
Theorem 2.There exists at least one SE in the proposed MOAS game.
The operation process of reaching the SE of the proposed MOAS game is shown in Figure.3.First,users in the network adopt their channel access strategies and transmit data on corresponding channels.Then,the jammer chooses its best-response jamming strategy according to the tracking rule after sensing the activities of users.After that,users adjust their strategies at the start of the next time slot until reaching NE.While for the follower,after obtaining the optimal NE of the game,it continues to adjust its best response jamming strategy in response to the optimal NE of users.After constant adjustments,the SE of the system will be reached finally.
Figure 3.The operation process of reaching the SE of the proposed MOAS game.
In this section,the active attraction based antijamming channel access(3ACA)algorithm is proposed.Note that the operation process of converging to the SE is shown in Figure.3.However,it is hard to reach the SE as users are hard to predict the jammer’s activities in advance.To ensure the first-mover advantage of leaders,an active attraction mechanism is designed,where users being jammed keep their channel access strategies unchanged to attract the jammer,and create access chances for other users.By attracting the jammer to attack one certain channel,users are able to observe the jamming activity in the previous slot,and then carry out distributed learning to coordinate the internal and external conflicts to reach the SE of the game.
The slot structure of the proposed 3ACA algorithm is shown in Figure.4.At each time slot,users firstly access the channel according to their strategies.Then,they transmit in their selected channels.After the data transmissions,each user acquires its feedback according to the transmission outcome,i.e.,success or failure.At last,each user updates its transmission strategy according to the updating mechanism proposed in the 3ACA algorithm.Once the first transmission slot is finished,users will implement the active attraction mechanism to trick the jammer.While for the jammer,it turns to sense when users are not transmitting,or jams the channel with maximal received power strength.
Figure 4.The slot structure of the active attraction based anti-jamming channel access(3ACA)algorithm.
Under the guidance of the operation slot structure,each user updates its channel access strategy using distributed learning mechanism[38,39].The detail of the proposed 3ACA algorithm is shown in Algorithm 1.
When user k is in the sensing state,it randomly selects one channel c′to sense.If the selected channel is idle,it changes the strategy to sk=c′.Otherwise,it keeps the strategy sk=0.Besides,when user n start transmitting,it needs to judge whether being jammed.If the access channel skis being jammed,then users who are located in skwill keep transmitting to attract the jammer staying on sk.This mechanism can effectively ensure the availability of other channels.Furthermore,if one user is in conflict with another one,then it sets sk=0 with probability Pband keeps sensing at the next time slot.Otherwise,it keeps the same channel access strategy with probability 1-Pbto reduce the conflict degree.Note that a re-learning mechanism is also introduced,which attempts to provide fair channel access opportunities for different users.
1Usually,it is assumed that the jammer’s transmission power is much greater than legitimate users’,thus users can judge whether being jammed by comparing the received signal energy value
To show the convergence speed of the proposed 3ACA algorithm,the complexity analysis is conducted in this subsection.In detail,the complexity is consisted of storage complexity,sensing complexity and computing complexity.First,each user needs to store the channel access strategy at each time slot,and the complexity is assumed to be O(W1),where W1is a constant.As the number of user is|K|,thus the storage complexity is O(|K|W1).Secondly,each user needs to sense the spectrum via classical energy detection,and the sensing complexity is assumed to be O(W2),where W2is a small constant.Thus,the total sensing complexity for all users is O(|K|W2).Thirdly,strategy updating is necessary at each time slot,and the updating complexity is O(W3),where W3is also a small constant according to the updating process shown in(14).Thus,the complexity at each time slot is O(|K|W1+|K|W2+|K|W3).
Denote the maximal iteration as Tmax,hence the to-tal complexity is:
Note that users are able to converge to the stable solution within few time slots,thus it is concluded that the complexity is relatively low,and the proposed 3ACA algorithm is effective.
In this section,the simulation results and discussions are illustrated.First,we give the parameter setting,as shown in Table 1.The duration of each time slot is set to be Tslot=10 ms,where the duration of sensing(Tsensing),ACK receiving(TACK)and strategy updating(Tupdate)is 10 ms.Besides,the time duration of data transmission is Ttrans=7 ms.The available channels are set to be 5,10 and 15 respectively,and the number of users varies from 10 to 100.To transmit data,the transmission power Pkis set to be 0.1 W,and the transmission distance is 1000 m for each transmitter-receiver pair.The pass-loss factor is 3,the channel bandwidth is set to be 2MHz,and the background noise is-100 dBm.The jammer possesses the capability of tracking and senses the available channels at each time slot.After that,it selects one channel with maximal received power strength to attack.When there are no transmission activities on channels,the jammer keeps silent to save energy.
Table 1.Parameters settings in simulations.
Secondly,the convergence performance is shown in this subsection.As is shown in Figure.5,the convergence time with different network scales is compared.Here,three network scales are considered,the 10 users case,20 users case and 30 users case.It is assumed that there are 5 available channels,and the strategy adjustment factor τ is set to be 40.It can be seen that the proposed algorithm converges to the maximal transmission rate rapidly(Approximately 10 slots,100 ms).Besides,with the extension of the network scale,the slots needed for converging increases,which is due to the aggravation of the network conflicts.Note that in the 41st time slot,the network throughput declines as users break the convergence state actively,and create access opportunities for those who do not transmit at previous time slots.
Figure 5.The convergence time with different network scales.
To illustrate the convergence more intuitively,the time-frequency diagram under tracking jamming attacks is shown in Figure.6.It depicts that users adjust their channel access actions according to their updating strategies,and the jammer chooses its best jamming strategy at each time slot.It can be seen that users break the convergence state and re-learn at the 41st time slot as the conflict degree increases.Here,it can be seen that user 6,7 and 9 choose channel 4,thus the jammer also chooses channel 4 to attack as there exist 3 users using the same channel.After that,user 6,7 and 9 keep their channel access strategy unchanged to actively attract the jammer,while other users update their strategies to coordinate the internal conflicts.After updating for several time slots,user 2,3,9 and 10 monopolize one channel to transmit,while user 4,5 and 8 attract the jammer constantly.Thus,it can be concluded that the network converges rapidly,and users except the bait ones are able to avoid the malicious jamming attacks as well as to coordinate the internal interference.
Figure 6.The time-frequency diagram under tracking jamming attacks.
To verify the effectiveness of the proposed 3ACA algorithm,the performance is analyzed with different number of channels,different number of users and different algorithms.Here,3 different comparative algorithms are considered.(i)Multi-channel slotted ALOHA.In this algorithm,each user randomly selects one channel to access[40].(ii)Advanced slotted ALOHA.In this algorithm,each user randomly selects one channel,and accesses the channel with probability paccess,or keeps silent with probability 1-paccess.(iii)Learning algorithm based on jamming utilization.In this algorithm,each user utilizes the jamming signal sensed in previous slots as the coordination signal,and adopts distributed learning mechanism to update the channel access strategies[16].
Figure.7 depicts the throughput comparison with different algorithms.To reduce the random error,the cumulative network throughput is counted every 50 time slots,and we take the mean value of 100 times Monte-Carlo experiments.As can be seen from Figure.7,the proposed 3ACA algorithm outperforms other comparative algorithms.The reasons are shown as follows:The learning algorithm proposed in[16]is not able to adapt to the characteristic of tracking jammer as the observation of the jamming signal is not fully accurate.Besides,in the advanced slotted ALOHA,different values of paccessare selected(From 0.1 to 0.9),and it is concluded that the advanced slotted ALOHA is able to achieve the highest throughput with paccess=0.6.While in typical slotted ALOHA,users are not able to avoid the jamming and coordinate the interference effectively,thus it achieves the worst performance.To sum up,Figure.7 shows that the proposed algorithm can effectively enhance the cumulative network throughput.Via combining the MOAS game and the 3ACA algorithm,the influence of the malicious jamming and co-channel interference can be significantly reduced.
Figure 7.The comparison of the cumulative network throughput with different algorithms.
Figure.8 shows the cumulative network throughput with different number of users and channels.Here,3 network scales are considered,the 10 users case,50 users case and 100 users case.Besides,3 different cases of channels are compared,the 5 channels case,10 channels case and 15 channels case.Thus,9 cases which are the combination of different number of users and channels are shown in Figure.8.It can be seen that the cumulative network throughput decreases with the increase of the number of users,as the internal conflicts have been aggravated when there are more users contending for these channel resources.In addition,with the increase of the channel numbers,there exists a rising tendency of the cumulative network throughput as there are more available timefrequency resource blocks for users to access.Furthermore,it depicts that when the number of users is less than the number of channels,the cumulative network is not high as the channel resources are not sufficiently utilized.
Figure 8.The comparison of the cumulative network throughput with different number of users and channels.
In this subsection,the fairness of the proposed 3ACA algorithm is analyzed.As mentioned above,the strategy adjustment factor τ is introduced,and users will break the convergence state after τ times iterations.Then,re-learning is carried out to deliver more chances for those users who do not transmit in previous time slots.
Figure 9.The comparison of the reachable performance ratio with different strategy adjustment factor τ.
Figure 10.The fairness comparison with different strategy adjustment factor τ.
Figure.9 shows the comparison of the reachable performance ratio with different strategy adjustment factor τ.Here,the throughput of 1000 slots is counted,5 different cases of τ are compared,that are 20,40,60,80,1000(1000 means no active adjustments are made by users).Besides,two different cases of users are considered,that are 20 users case and 50 users case.It can be seen that with the increase of the strategy adjustment factor,the reachable performance ratio pacincreases as the network maintains a relatively steady state,and fewer re-learning slots are taken for the strategy adjustment.In addition,the reachable performance ratio is great for the 20 users case than that of the 50 users case,as the increase of the network scale will increase the time taken to reach convergence,and cause serious performance loss.
Secondly,to calculate the fairness of the network with different strategy adjustment factor,the Jain Fairness Index(JFI)is introduced[41],which is expressed as follows:
where E[N]denotes the expectation of the successful transmission times for all users,and N is the random variable of the number of successful transmission slots of the network.Thus,the above equation can be rewritten as:
where ukis the statistics of the successful transmissions of user k.Generally speaking,the network fairness is higher with a greater JFI value.Besides,JFI(N)=1 means an absolutely fair network.
The fairness comparison with different strategy adjustment factor τ is shown in Figure.10.The simulation parameters are the same with Figure.9.It can be seen that with the increase of the strategy adjustment factor,the JFI of the network decreases,as there are fewer access opportunities for users once converging.In other words,users are less inclined to break the convergence.Note that when τ=20,the JFI is 0.925 for the 20 users scenario,which means users obtain relatively fair access opportunities.In addition,with the increase of the number of users,the JFI will decrease when keeping τ unchanged.The reason is that there are fewer opportunities for users to break the convergence when the conflicts are more serious.
Comparing Figure.9 and Figure.10,it is interesting that with the increase of the strategy adjustment factor,the fairness of the network decreases,while the network performance increases.Thus,finding a suitable τ that balance the fairness and the performance is meaningful.For the 20 users scenario,users are able to obtain relative high fairness and performance when τ=60(JFI is 0.861,and the reachable performance ratio is 95.94%).While for the 50 users case,τ=40is an appropriate adjustment cycle.Note that some preliminary discussions are made from the perspective of fairness and performance,appropriate tradeoff index design is meaningful for future investigation.
In this paper,we investigated the coordinated antijamming channel access problems against the tracking jammer.A multi-leader one-follower anti-jamming Stackelberg(MOAS)game was formulated to model the complex interactions between users and the tracking jammer.Besides,the existence of Stackelberg equilibriums(SEs)was proved,which means users are able to reach Nash Equilibriums(NEs)for each jamming strategy while the jammer can find its best response jamming strategy for the current network access case.An active attraction based antijamming channel access(3ACA)algorithm was designed,where jammed users kept their channel access strategies unchanged to create access chances for other users.In the end,simulation results depicted that the proposed approach was able to guarantee both fairness and performance when confronting the tracking jammer.