Delay-Optimal Random Access in Large-Scale Energy Harvesting IoT Networks Based on Mean Field Game

2022-04-20 05:57DezhiWangWeiWangZhaoyangZhangAipingHuang
China Communications 2022年4期

Dezhi Wang,Wei Wang,Zhaoyang Zhang,Aiping Huang

College of Information Science and Electronic Engineering,Zhejiang University,Hangzhou 310027,China,and Zhejiang Provincial Key Laboratory of Information Processing Communication and Networking,Zhejiang University,Hangzhou 310027,China

Abstract:With energy harvesting capability,the Internet of things(IoT)devices transmit data depending on their available energy,which leads to a more complicated coupling and brings new technical challenges to delay optimization.In this paper,we study the delay-optimal random access(RA)in large-scale energy harvesting IoT networks.We model a two-dimensional Markov decision process(MDP)to address the coupling between the data and energy queues,and adopt the mean feld game(MFG)theory to reveal the coupling among the devices by utilizing the large-scale property.Specifcally,to obtain the optimal access strategy for each device,we derive the Hamilton-Jacobi-Bellman(HJB)equation which requires the statistical information of other devices.Moreover,to model the evolution of the states distribution in the system,we derive the Fokker-Planck-Kolmogorov(FPK)equation based on the access strategy of devices.By solving the two coupled equations,we obtain the delay-optimal random access solution in an iterative manner with Lax-Friedrichs method.Finally,the simulation results show that the proposed scheme achieves signifcant performance gain compared with the conventional schemes.

Keywords:wireless communications;energy harvesting;random access;mean feld game

I.INTRODUCTION

The energy harvesting technology enables the Internet of things(IoT)devices to collect energy from natural or man-made phenomena[1].The energy harvesting capability brings the self-sustainability property and provides nearly permanent node lifetime,which signifcantly enlarges the IoT application scenarios.For IoT using 5G communication system,referred to as massive machine-type communications(mMTC),a large number of devices may cause severe conficts during the random access(RA)process[2].Thus,the overload control scheme becomes a critical issue for improving the RA performance[3,4].

For overload control,it is widely adopted in many schemes to control the RA attempt of devices for reducing the confict.In[5],the access class barring(ACB)method was proposed to attempt to access with a given probability distributively.In many real-life applications,the data transmission is usually delay sensitive since the out-of-date information is meaningless.The above RA schemes can handle the large number of devices.However,it may cause long RA delay in large-scale networks,because the urgency of the data and the available energy of the devices are not taken into consideration.

In the previous research[6,7],some preliminary results are provided on the random access considering the infuence of data and energy.Compared with[6],the situation of multi-channel is considered in this paper,which brings the diffculty to calculate the infuence from others.Different from[7],the Lax-Friedrichs method is adopted to solve the delayoptimal large-scale RA problem.Furthermore,the existence and the uniqueness are proved for the equilibrium solution.

There are two technical challenges involved in the delay-optimal RA problem as follows:

·Challenges due to the coupling induced by RA conflict:The RA confict occurs when multiple devices try to access to the same preamble simultaneously.Due to the RA confict,the large number of devices are coupled together.It is nontrivial to design a distributive RA scheme considering not only the local state of device but also the infuence from other devices,especially in large-scale networks.

·Challenges due to coupling between the energy and data queues:The data and energy queues are coupled due to the energy consumption for data transmission.Considering the practical hardware limitations,i.e.,the limited battery storage and the data buffer space,the excessive harvested energy is wasted and the extravagant data is dropped,which lead to a more complicated dependency between the data and energy queues in the stochastic optimization problem.

To overcome the above technical challenges,we model a two-dimensional Markov decision process(MDP)to consider both the energy queue state information(EQSI)and the data queue state information(DQSI),and to address the issue of the limited battery storage and the data buffer space.The main contributions of this paper are as follows:

·We adopt the mean feld game(MFG)theory to exploit the large-scale property,where the infuence from other devices are described by the statistical information.Specifcally,we derive the Hamilton-Jacobi-Bellman(HJB)equation for solving the MDP at each device and the Fokker-Planck-Kolmogorov(FPK)equation for modelling the evolution of the mean feld,i.e.,the statistical information.

·We propose a RA scheme by solving the two coupled partial differential equations iteratively using the Lax-Friedrichs and Largrange multiplier schemes.The computational complexity of the proposed scheme isO(n),wherenis the space of state,which is acceptable in real application.Finally,the existence and the uniqueness of Nash equilibrium from the perspective of MFG theory is analyzed in the paper.

The rest of this paper is organized as follows.Section II discusses the related works.Section III presents the system model.Section IV formulates the delayoptimal RA stochastic optimization problem.We discuss the optimality conditions and solve the RA problem in Sections V and VI respectively.Section VII provides the simulation results.Finally,Section VIII concludes this paper.

II.RELATED WORKS

This paper studies the delay-optimal for RA problem in large-scale energy harvesting networks.In this section,we briefy review the existing works on the RA overload control and the delay optimization approaches respectively.

2.1 RA Overload Control

There have been a number of existing works which focus on handling the RA overload.The overload control algorithms can be divided into two categories: the physical layer and the MAC layer.In the physical layer,some physical-layer overload control techniques are provided in[8]for improving the RA capacity,including the compressed sensing based on multi-user detection,sparse code multiple access and continuous phase modulation.In the MAC layer,the access class barring(ACB)is a simple and effective way to solve the overload of RACH[9].Based on the classic ACB algorithms,a cooperative ACB mechanism was proposed for jointly optimizing the ACB parameters of multiple base stations in[10].In[11],the barring factor and the mean barring time are dynamically tuned using the dueling deep Q-network.A novel spatial group based ACB scheme is proposed in[12]for random access with timing assisted IoT device identifcation and preamble reuse.In[13],a novel recursive access class barring(R-ACB)technique was proposed to optimally utilize the available resources during the random access procedure in cellular IoT networks.

The above existing works focus to improve the performance in the process of RA.However,the infuence from other devices on the RA process is not carefully considered,especially in large-scale networks.

2.2 RA Delay Optimization

For minimizing the delay of RA,a few schemes are designed for improving the probability of successful access.In[13],the R-ACB technique was proposed to reduce the average access delay,which considers the multiple steps in cellular IoT networks.In[14],an iterative process was proposed to acquire the dynamic process of multichannel slotted ALOHA.In[15],an enhanced random access scheme that avoids random access retrials was proposed and the random access latency was analyzed.In addition,a distributed queueing algorithm into the medium access control(MAC)over the LoRa physical layer was proposed in[16],which reduces the latency effciently.The above works are applicable for the small-scale networks,however,the severe overload will happen in large-scale networks.

III.SYSTEM MODEL

Consider a system withNdevices which send the data to a wireless access point(AP)overMcpreambles,and each device has fnite energy and data storage space.The system model is shown in fgure 1.

3.1 Random Access Model

The contention-based RA process contains a fourhandshake between the device and the AP,including preamble transmission,random access response(RAR),connection request and contention resolution[17].In this paper,when multiple devices choose to access the AP,they will select a preamble from the available preamble set for RA.Then,the AP sends the RAR to the devices that access successfully according to the preamble detection.When two or more devices choose the same preamble,the collision happens.Otherwise,the preamble can be detected by the AP.In the third step,the device that detected successfully via the preamble will send the connection request message to the AP.In the last step,the AP will transmit a contention resolution message as a response to the contention request message.After these,the RA process is completed,and the device can transmit the data.Notice that each access trail consumes energy regardless of whether the device accesses successfully or not,and a device can transmit data only if it accesses successfully.

The success of RA for deviceidepends on not only the action of this device but also the actions of other devices since they share common resource.Defneai(t)∈[0,1]as the access probability of deviceiat time slott.At each time slot,deviceigenerates an indicatorσi ∈{0,1}randomly to decide whether to access the network according toai(t).DefneNa(t)as the expected number of devices that attempt to access to the AP at time slott,i.e.,

For the RA process withMcpreambles with the setMandNa(t)accessible devices,we can derive the expected number of successful accessNs[5],which is expressed as

3.2 Queue Dynamics for Energy and Data

IV.MULTI-DIMENSIONALDELAYOPTIMAL STOCHASTIC OPTIMIZATION PROBLEM

Defneξk ∈{0,1},k ∈Mas the indicator of access successfully or not for each preamble.As introduced in Subsection 3.1,the confict happens when two and more devices choose a same preamble,which restricts the access of devices.Thus,the optimization problem can be summarized as follows

where the frst constraint is the causality of energy,and the second constraint indicates at most one device chooses a preamble to access successfully at each time slot.

V.OPTIMALITY CONDITIONS FOR RANDOM ACCESS

In this section,we transform the problem to a continuous time system and exploits calculus techniques to obtain the optimal access strategy.

5.1 Discrete to Continuous Time System

5.2 Optimality Analysis

Similar to[20],the discrete form transforms to the continuous form with an approximation erroro(τ).Thus,the solution of the optimization problem can be obtained via solving(25).

Since the devices are individually incentive driven and non-cooperative,each device should make access decision to minimize its cost.When a device makes the access decision,it should estimate the effect from others and try to fnd decentralized strategies to optimize the individual object.The result of the game will lead to a Nash equilibrium,and the Nash equilibrium can be obtained by solving the HJB equation.Now we will show the existence of the Nash equilibrium in the following theorem.

Theorem 2(Existence of Nash equilibrium).There exists at least a Nash equilibrium for the above optimization problem.

Proof.Please refer to Appendix.

Note that the estimation of the effect from other devices should be considered to solve the HJB equation.In the following section,we adopt the mean feld game theory to solve the estimation problem.

VI.DELAY-OPTIMAL CONTROL BASED ON MEAN FIELD GAME

In this section,the MFG theory is adopted to estimate the infuence from other devices,and the iterative scheme is proposed to solve the mean feld equilibrium.

6.1 Mean Field Game

Since the cost function of one device is infuenced by the actions of the remaining devices.When the total number of devices increases,we should consider the dynamics of the state of the system instead of that of each device.Since the devices are homogeneous,there is no difference between the HJB equations,which indicates that the strategies of all the devices are the same.

The optimal strategies for the devices in the system can be determined by the coupled FPK and HJB equations[21].Before deriving the FPK equation,the mean feld is defned frstly,which describes the statistical distribution of the system states.

Definition 1(Mean Field).Given the state s(t)=[e(t),q(t)],the mean field m(t,s)is defined as

The partial differentials(24)and(29)are coupled together,which can lead to a Nash equilibrium.In addition,the uniqueness of the Nash equilibrium is expressed in the following theorem.

Theorem 3(Uniqueness of the Nash equilibrium).

The Nash equilibrium is unique for the above game if the following conditions are satisfied.

6.2 Iterative Scheme for Mean Field Equilibrium

According to[23],we adopt the Lax-Friedrichs method and Lagrange relaxation to solve the coupled HJB and FPK equations.In the Lax-Friedrichs method,the time interval[0,T],the energy space[0,Emax],and the data space[0,Qmax]should be discretized intoX×Y ×Zspaces.In the discrete spaces,there are several control paths from time 0 toT,and there must exists an optimal path.In order to fnd the optimal path,we denote the steps of the three dimensions as

Thus,we haveX+1 points in the time space,Y+1 points in the energy space,and theZ+ 1 points in data space.The FPK equation will be updated in these three dimensional spaces,and the optimal path will be solved by the HJB equation.The equilibrium will be achieved via the iterations of the two coupled equations.

6.2.1 Solution to the FPK equation

The FPK equation can be solved by the Lax-Friedrichs method,which can guarantee the positivity of the mean feld[24].By applying the Lax-Friedrichs method to(29),we have

whereMt,j,kandat,j,krepresent the value of the mean feld and the action at time slott,the energy queue lengthjand the data queue lengthkin the discrete space,respectively,andhtis the channel state at time slott.

6.2.2 Solution to the HJB Equation

There is no existing approach suitable for solving the HJB equation directly[25].However,since the optimal strategy can be obtained by solving the HJB equation,we can reformulate the problem and derive the optimal strategy,which is equivalent to the solution of the HJB equation.The subscript ofqiis omitted due to the homogeneous properties of devices.The reformulated problem is expressed as

Sinceq(T)is zero whenTreaches to infnity according to Theorem 1.According to(31),the Lagrange equation is expressed as

whereλ(s,t)is the corresponding Lagrange multiplier.The discrete form of the Lagrange equation can be written as

where the functions Υ,Φ,Ψ are

Algorithm 1.Delay-optimal random access algorithm.1: Initialization:M0,:,:,λX+1,:,:,a0,:,:,iteration=1 2: repeat 3:for t = 1 : X +1,j = 1 : Y +1,and k = 1 :Z+1 4:Update Mt+1,j,k according to(30)5:Normalize M for each time slot t 6: end for 7:for t = X +1 :-1 : 1,j = 1 : Y +1,and k =1:Z+1 8:Update λt-1,j,k according to(40)9: end for 10:for t = 1 : X +1,j = 1 : Y +1,and k = 1 :Z+1 11:Update at,j,k according to(39)12: end for 13: iteration=iteration+1 14: until iteration ≥Itermax

The details of the iterations are shown in Algorithm 1,which shows that(30),(39)and(40)can be solved iteratively until the Nash equilibrium is obtained.

Next,we analyze the computational complexity of the proposed scheme.Defnenas the space size withn=(X+1)×(Y+1)×(Z+1).In each iteration,all the spaces are traversed with three times,and there are fnite computing times for each traversal according to the scheme.The scheme hasItermaxiterations,which is also fnite.Thus the computational complexity of each iteration isO(n),which is acceptable in the practical application.

VII.NUMERICAL RESULTS

In this section,the numerical results are presented for the proposed scheme.First,we present the access probability corresponding to a certain state and the distribution of the system states based on the access probability.Then,we compare the performance of the proposed scheme compared with existing access schemes.

In the simulation,we consider a single cell with radius 500m,and the devices are distributed in the cell randomly.The path gain for devicekisPLk=15.3 + 37.6 log10dkwith the fading coeffcient distributed asCN(0,1),wheredkis the distance between devicekand the AP[27],and the noise power spectrum density is-174dBm/Hz[28].The system bandwidth is set to 1.4MHz[29],and the total number of preambles available for devices is set to 54[30].

7.1 Access Behavior at the Equilibrium

To analyze the behavior of each device based on the MFG scheme,we setX= 10000,Y= 20,Z= 20,N= 2000,the initial state distributionM0,:,:is set as uniform.

In fgure 2,we show the relationship between the access probability and the state.It can be observed that the access probability is extremely different when the devices are in different states.When the data and energy queues are short,the access probability is small.In contrast,when a device has long data and energy queue,the access probability increases.A device has high access probability with long data and energy queues,and then it will enter the state of short data and energy queues.When the initial state is low data and energy queues,the devices attempt to access to the AP with a low access probability,and wait for the arrivals of data and energy.Figure 3.The cross-sections of access probability for different data queues.

Figure 1.System model of the RA problem.

Figure 2.The optimal access probability for different states.

Figure 4.The cross-sections of access probability for different energy queues.

In order to give a more intuitive understanding of the above scenarios,we plot several cross-sections of fgure 2 from the perspectives of the data and energy queues in fgures 3 and 4 respectively.We can easily see that the access probability increases with the data and energy queues respectively.Compared with the energy queue,the access probability grows faster with the data queue,especially for long data queue,which indicates that the data queue is more urgent than the energy queue.Figure 5.The probability density function of the system states at the equilibrium.Figure 6.The cross-sections of probability density function for different data queues.

Figure 7.The cross-sections of probability density function for different energy queues.

Figure 8.The average delay for different data arrival rates(β).

Figure 9.The average delay for different number of devices.

In fgure 5,we plot the distribution of system states at the equilibrium.When all the devices take the proposed strategy,most of the devices in the system are in relative shorter of data and energy queues.It proves that the proposed scheme has a good performance on reducing the delay.We also plot a few cross-sections of fgure 5 from the perspectives of different data and energy queues,respectively.It is seen that most of energy queues are distributed at relative long state in fgure 6,which means that the proposed scheme has high effciency.In fgure 7,the data queues are centered on the short state.An unexpected scenario is that a small increase of the probability distribution occurs in the position of long data queue.This is because when a device is in the state of longest data queue,it attempts to access with the maximum probability,which results in the maximum reduction of the data queue.

7.2 Performance Comparison

In order to prove the superiority of the proposed scheme,we compare it with the following three baselines:

·Baseline 1,Greedy strategy(GS)[31]: The devices try to access to the AP with the maximum probability once they have energy and data.

·Baseline 2,Access class barring(ACB)[5]: The initial access probabilities of devices are controlled the assigned ACB parameter to reduce the congestion.

·Baseline 3,Queue aware random access(QARA)[16]: The devices adapt their transmission probabilities based on the queue states of other devices.

We consider the changes of different variables,including the number of devices and the data arrival rate.fgure 8 illustrates the average delay for different access strategies when the date arrival rate changes.The proposed scheme achieves good delay performance compared with other baselines.This is because that the access probability of the proposed scheme increases with the data arrival rate according to fgure 2,which will reduce the delay compared with other schemes.

Figure 9 compares the delay performance for different number of devices.The average delay increases with the increased number of devices and our proposed scheme outperforms other schemes,especially in the case with massive devices due to the consideration of the infuence from other devices in the system.

VIII.CONCLUSION

In this paper,we propose a delay-optimal RA scheme to minimize the delay in large-scale energy harvesting IoT networks.To obtain the access strategy for each device,the HJB equation is derived,and the infuence from other devices should be estimated in the HJB equation.To solve the estimation problem,we adopt the MFG theory to estimate the infuence from other devices,and further derive the FPK equation,which provides the evolution of the system states.To solve the coupled HJB and FPK equations,we adopt the Lax-Friedrichs method and Lagrange relaxation to derive the optimal access decision for each device.The numerical results validate that the proposed scheme outperforms the baselines.In the future work,we will extend the proposed scheme to more practical scenarios with heterogeneous devices by estimating the infuence from other devices via the information exchange among neighbor devices.

ACKNOWLEDGEMENT

Part of this work has been presented in IEEE ICC 2018,Kansas City,U.S.A.[6].This work is supported in part by Key R&D Program of Zhejiang(No.2022C03078),National Natural Science Foundation of China(No.U20A20158),National Key R&D Program of China(No.2018YFB1801104),Ningbo S&T Major Project(No.2019B10079).

APPENDIX

Proof of Theorem 2

Proof of Theorem 3