Tong Minglei ,Li Song ,Han Wanjiang ,Wang Xiaoxiang,*
1 School of Information and Communication Engineering,Beijing University of Posts and Telecommunications,Beijing 100876,China
2 Key Laboratory of Universal Wireless Communications,Ministry of Education,Beijing University of Posts and Telecommunications,Beijing 100876,China
3 School of Information and Control Engineering,China University of Mining and Technology,Xuzhou 221116,China
4 School of Computer Science(National Pilot Software Engineering School),Beijing University of Posts and Telecommunications,Beijing 100876,China
Abstract: Mobile edge computing (MEC)-enabled satellite-terrestrial networks (STNs) can provide Internet of Things(IoT)devices with global computing services.Sometimes,the network state information is uncertain or unknown.To deal with this situation,we investigate online learning-based offloading decision and resource allocation in MEC-enabled STNs in this paper.The problem of minimizing the average sum task completion delay of all IoT devices over all time periods is formulated.We decompose this optimization problem into a task offloading decision problem and a computing resource allocation problem.A joint optimization scheme of offloading decision and resource allocation is then proposed,which consists of a task offloading decision algorithm based on the devices cooperation aided upper confidence bound(UCB)algorithm and a computing resource allocation algorithm based on the Lagrange multiplier method.Simulation results validate that the proposed scheme performs better than other baseline schemes.
Keywords: computing resource allocation;mobile edge computing;satellite-terrestrial networks;task offloading decision
Internet of Things(IoT)promotes various applications in many domains[1].The existing terrestrial network can only cover 20%of the land,accounting for 6%of the earth’s surface [2].Terrestrial networks are generally deployed in densely populated areas,such as cities.For some sparsely populated areas,such as rural areas,deserts,and forests,terrestrial networks are not deployed due to difficulties in construction or exorbitant cost.As a result,these areas cannot achieve IoT services through terrestrial networks.
Satellite networks support broader coverage than terrestrial networks and can even achieve seamless global coverage.Satellites can extend terrestrial networks and provide services for IoT devices[3].Moreover,terrestrial IoT devices can connect to low earth orbit(LEO)satellites directly without an intermediate gateway [4].The satellite-terrestrial network (STN),i.e.,the integration of satellite networks and terrestrial networks attracts the attention of academia and industry.STNs have many advantages,e.g.,wide coverage,high reliability,and high connection security[5].Obviously,STNs are suitable for providing wide-area IoT services in rural areas,deserts,forests,etc.
IoT applications generate different types of tasks when providing services,e.g.,delay-sensitive tasks and computation-intensive tasks.Due to their limited size,IoT devices have limited computing and energy resources,which may not be able to meet the requirements of these tasks [6].Mobile edge computing (MEC) is supposed to solve this problem.MEC makes the cloud computing capability and the information technology(IT)service environment sink from the cloud to the network edge[7].
The integration of edge computing into LEO satellite networks can support delay-sensitive and resourcescarce wide area IoT applications [8].MEC servers can be deployed at LEO satellites,thus providing satellite edge computing services for IoT devices[9].LEO satellites are much closer to IoT devices than terrestrial network entities in nearby areas.Moreover,if tasks can be computed by LEO satellites,there is no need to forward them to terrestrial network entities for computing through satellite-terrestrial links.Therefore,IoT devices can offload tasks to LEO satellites for computing,which can reduce the corresponding transmission delay and energy consumption and prolong the battery life of IoT devices.
Benefiting from the development of space technology,many LEO satellite constellations,e.g.,Starlink,Kuiper,Iridium NEXT,and Globalstar have been built in space [10].The number of satellites in space has increased rapidly.At each moment,an IoT device is likely to be located within the coverage of multiple satellites,which can provide MEC services to it.To improve the quality of service(QoS),IoT devices need to make appropriate task offloading decisions.
In most practical cases,IoT devices are not aware of network state information involving LEO satellites,channels,etc.IoT devices need to perform online learning,which is the gradual learning of uncertain or unknown network state information during the process of trying to make task offloading decisions,thus finding and making the optimal task offloading decision gradually to achieve a long-term goal.This is quite similar to the multi-armed bandit(MAB)problem and its solution procedure.In addition,due to the relatively limited size and weight of LEO satellites,the computing resources of LEO satellites’ MEC servers are limited and less than those of terrestrial base stations(BSs)’MEC servers.Optimal computing resource allocation is essential when all IoT devices offload tasks to the same LEO satellite.
In this paper,we design a system model of MECenabled STNs,which consists of multiple identical IoT devices and LEO satellites with MEC servers as computing nodes.These IoT devices belong to the same enterprise and are deployed in an area without terrestrial network coverage to accomplish a common goal over multiple consecutive time periods.Some data needs to be collected,then combined and inputted into an application for processing.Each IoT device is responsible for collecting a portion of the data,which is seen as the task generation in each time period.All tasks need to be processed centrally.Therefore,these IoT devices need to select the same LEO satellite for computation offloading,i.e.,to make a unified task offloading decision.We formulate the optimization problem to minimize the average sum task completion delay of all IoT devices over all time periods,which is an mixed-integer nonlinear programming(MINLP)problem.To solve this problem,we decompose it into a task offloading decision problem for IoT devices and a computing resource allocation problem for LEO satellites.Then,we propose a joint optimization scheme of offloading decision and resource allocation.
The main contributions of this paper are summarized as follows.Firstly,we propose a joint optimization scheme of offloading decision and resource allocation,which consists of a task offloading decision algorithm based on the devices cooperation aided upper confidence bound (UCB) algorithm and a computing resource allocation algorithm based on the Lagrange multiplier method.Specifically,the task offloading decision algorithm enables IoT devices to make task offloading decisions without knowing the network state information involving LEO satellites,channels,etc.The computing resource allocation problem is proven to be a convex optimization problem.The proposed computing resource allocation algorithm helps the selected LEO satellite achieve the optimal computing resource allocation for the tasks.Secondly,we evaluate the performance of the proposed scheme through simulations.Simulation results validate that the proposed scheme has lower average sum task completion delay,smaller regret,and larger optimal decision ratio than other baseline schemes.
The remainder of this paper is organized as follows.Section II introduces the related work.Section III describes the system model.In Section IV,the optimization problem of average sum task completion delay is formulated.The scheme to solve the optimization problem is proposed in Section V.Section VI shows the simulation results and discusses them.The conclusions are drawn in Section VII.
There are some works that introduce MEC into STNs.Reference [11] surveyed the integration of satellite networks and terrestrial networks.It explained that MEC was one of the technologies that can improve network performance,and that task offloading and computing were worth studying.Reference [12] designed a satellite-terrestrial integrated edge computing network,where satellites with MEC platforms can compute tasks from user equipment directly.Reference[5]designed a double-edge intelligent integrated STN architecture,where terrestrial MEC servers and satellite MEC servers can compute tasks offloaded by users.A greedy algorithm-based task migration method was proposed to reduce the average delay of the system.Reference [13] designed an LEOassisted STN,where a task can be offloaded to a traditional small cell(TSC)for computing or can be offloaded to a TSC and forwarded to the remote cloud for computing.The task can also be offloaded to an LEO-backhauled small cell (LSC) and forwarded to an LEO satellite,then it was forwarded to the cloud server for computing.Cloud-edge collaborative offloading minimized system energy consumption while meeting the delay requirements.Reference [14] designed a terrestrial-satellite IoT network model,where terrestrial-satellite terminals assisted remote IoT mobile devices in offloading tasks to LEO satellites,thus using the satellites’ MEC servers to compute tasks.This work proposed an energy-efficient computation offloading and resource allocation algorithm to minimize the weighted sum of the energy consumption of IoT mobile devices when the task tolerance delay was satisfied.Reference[15]designed a satellite-terrestrial bilateral computing system and proposed a joint optimization method of BS’s task execution sequence and satellite’s resource allocation,thus maximizing benefits of MEC resource providers while ensuring the quality of experience (QoE) of computing tasks generated by IoT nodes.
Most of the above works are done when all the information is known.The user device is aware of all the information clearly,thus making an offloading decision based on a criterion.However,in most practical cases,much information,such as network state information involving satellites and channels,is uncertain or unknown.
There are some works that research the task offloading problem in the framework of MAB theory.In [16],task vehicles offloaded tasks to BSs,and BSs offloaded tasks to service vehicles for computing.This work proposed a fluctuation-aware learningbased computation offloading algorithm based on MAB theory to minimize the average offloading delay in a vehicular edge computing system.Reference [17] researched the vehicle fog computing scenario.It proposed a vehicle computing resource management mechanism based on contract theory and a task offloading mechanism based on matching learning,thus improving resource utilization efficiency and reducing task offloading delay.Reference [18] designed a space-air-ground-edge integrated maritime network architecture,where unmanned aerial vehicles were deployed with MEC servers.It proposed an edge server selection algorithm based on the MAB model to reduce the offloading delay and weighted total cost.In [19],Industrial Internet of Things (IIoT)devices can utilize MEC offloading and device-todevice (D2D) offloading to offload tasks collaboratively in D2D-assisted MEC networks.This work proposed a learning-based task co-offloading algorithm based on MAB theory to minimize the weighted sum of task delay and migration cost.
Most of the above works have been researched in the context of vehicular networks,maritime networks,or IIoT.As far as we know,there are few works that have been researched in the context of STNs or satellite networks.
In this section,we introduce the network model,the channel model,and the computation model sequentially and in detail.
As depicted in Figure 1,we consider an area without terrestrial network coverage due to special geographical location,difficult construction,and excessive cost.An enterprise deploys multiple identical IoT devices in the area to achieve a goal such as environmental monitoring.Each IoT device has a very small aperture terminal (VSAT) with a directional antenna that allows direct communication with LEO satellites in space.The IoT devices make up the setN.Each IoT device generates an indivisible computing task in each time period.In thetth time period,IoT deviceigenerates a task,denoted by taski(t)={q,ai(t)},i ∈N,whereqis the task computing intensity,i.e.,the computing resources required for computing 1 bit of the task,andai(t)is the task data size.These tasks need to be clustered together and then fed into an application for computing.IoT devices have limited computing capabilities,battery life,and energy,so they are not suitable for computing tasks that require a large amount of computing resources.Hence,the computing capability of IoT devices is ignored.These IoT devices are located within the coverage ofMLEO satellites,which make up the setM.These LEO satellites are deployed with MEC servers.In each time period,the IoT devices need to select one of the LEO satellites to offload tasks,thus utilizing the satellite’s MEC server to compute tasks.Since the MEC servers of LEO satellites have limited computing resources,the selected LEO satellite needs to allocate appropriate computing resources to the received tasks.The main symbols used in this paper are summarized in Table 1.
Table 1.Symbol definition.
Figure 1.System model.
Figure 2.Illustration of di,j(t).
Satellite-terrestrial links between IoT devices and LEO satellites use Ka-band.Since the line-of-sight(LOS) transmission dominates the links in suburban and rural areas[20],the channels are modeled as Rice channels.We assume that both the wireless channel states and the positions of the LEO satellites are static during each time period.In thetth time period,the achievable data rate of the link between IoT deviceiand LEO satellitejis
We assume that the task generated by each IoT device in each time period will be completed in that time period.In thetth time period,all IoT devices offload tasks to LEO satellitejfor computing.The transmission delay from IoT deviceito LEO satellitejis
Due to the quite long distance between IoT deviceiand LEO satellitej,the propagation delay cannot be ignored.The propagation delay from IoT deviceito LEO satellitejis
The delay that the MEC server of LEO satellitejcomputes taski(t)is
wherefi,j(t) are the computing resources that the MEC server of LEO satellitejallocates to taski(t)in thetth time period.
Thus,the task completion delay of IoT deviceiis
In many cases,the size of computed output data is often much smaller than that of input data.Hence,the delay of sending result data from LEO satellites to IoT devices is ignored[14,22].But the propagation delay from LEO satellites to IoT devices cannot be ignored.Because the calculation method,or the value of this propagation delay is the same as that of(t),we add a coefficient of 2 to(t)in(5).
The optimization objective of this paper is to minimize the average sum task completion delay of all IoT devices over all time periods.The optimization variables include the task offloading decision of each IoT device and the computing resource allocation of the selected LEO satellite.The set of time periods is denoted byT={1,2,...,T}.We set the indicator variableli,j(t)∈{0,1},i ∈N,j ∈M,t ∈Tto represent the task offloading decision.IoT deviceioffloads the task to LEO satellitejfor computing in thetth time period,li,j(t)=1,otherwise,li,j(t)=0.The average sum task completion delay of all IoT devices over all time periods isU=li,j(t)Di,j(t)]/T.The optimization problem is formulated as
where(6a)is the objective function.The tasks generated by all IoT devices in each time period need to be offloaded to the same LEO satellite to utilize its MEC server for computing.In other words,only one satellite is selected by all IoT devices.So (6b) and (6c)exist.(6b)means that the task generated by each IoT device in each time period can only be offloaded to one LEO satellite.(6c)means that each satellite either receives tasks from all IoT devices or none at all.(6d)indicates that in thetth time period,the computing resources allocated by LEO satellitejto the tasks cannot exceed its computing resources,i.e.,Fj.(6e)indicates that the value ofli,j(t)is 0 or 1.(6f)means thatfi,j(t)is nonnegative and has an upper limit.
Sinceli,j(t)is a discrete variable andfi,j(t)is a continuous variable,Pis an MINLP problem.Pis NPhard.The proof is described in the APPENDIX.It is difficult to solve by traditional optimization methods,such as convex optimization.In each time period,all IoT devices need to select the same LEO satellite to offload tasks,and then the selected LEO satellite needs to allocate computing resources to the received tasks.It is noted thatai(t),di,j(t),andri,j(t)are different in each time period in most practical cases.Furthermore,the selected LEO satellite knows the attribute information of tasks offloaded by IoT devices,but IoT devices do not know the network state information involving LEO satellites,channels,etc.Therefore,the variational and unknown content undoubtedly increases the difficulty of solvingP.
In this section,we decomposePand then propose a joint optimization scheme of offloading decision and resource allocation to solve the problem.
There is a coupling betweenli,j(t) andfi,j(t) inP,so we need to decouple it.Solvingli,j(t) is similar to solving a sequential decision problem of IoT devices,andfi,j(t)is the computing resource allocation of an LEO satellite,so we can solveli,j(t)andfi,j(t)from two perspectives: IoT devices and LEO satellites.From IoT devices’ perspective,fi,j(t) is given by LEO satellites,so it is unknown.Therefore,Pis transformed into
P1 hasTtime periods.All IoT devices need to choose the same LEO satellite for task offloading in each time period.Moreover,network state information involving LEO satellites,channels,etc.is unknown to IoT devices.IoT devices can observe the correspondingDi,j(t)only after the task offloading decision is made and the tasks are completed.
SinceP1 has the characteristics described above,it is suitable to introduceP1 into the MAB framework,which is an online reinforcement learning(RL)framework and can find the optimal strategy without knowing the reward distribution corresponding to the action[23].P1 is similar to the MAB problem.InP1,an IoT device can be viewed as a gambler,and LEO satellitejis equivalent to thejth arm.The solution toli,j(t)is thetth arm selection,i.e.,the action.We notice thatP1 differs from the MAB problem in that the MAB problem typically has only one gambler,whereasP1 has multiple gamblers.Since we intend to minimize the average sum task completion delay,which is the opposite of revenue maximization,the negative of the average sum task completion delay is taken as revenue.In that way,P1 can be rewritten as
where(8)is the objective function.
In the MAB problem,the gambler confronts two choices each time,whether to perform an action with the known highest revenue or to perform an action with an unknown but possibly higher revenue.InP1.1,IoT devices confront two similar choices each time.If IoT devices always choose the LEO satellite with the highest historical revenue to offload tasks,they may miss the opportunity to offload tasks to the LEO satellite with potentially higher revenue.Conversely,they may not be able to achieve high revenue.Hence,an appropriate balance between exploration and exploitation is indispensable.
In order to solveP1.1,we propose a task offloading decision algorithm based on online learning.The algorithm is based on UCB1 algorithm[24]and is added devices cooperation to make the unified task offloading decision,which is denoted byx(t)∈M,t ∈T.x(t)=jis equivalent toli,j(t)=1,li,k(t)=0,t ∈T,i ∈N,k ∈M{j}.
The network state information and the revenue are uncertain or unknown,so IoT devices need to learn their probability distribution gradually from historical decisions and feedback in the process of continuous interaction with LEO satellites,and then make the task offloading decision based on the learned content,i.e.,learning while offloading.In order to estimate the revenue of the task offloading decision that IoT devices offload tasks to LEO satellitejin thetth time period,we define a utility function:
whereϕ[x(s),j] is an indicator function.Whenx(s)=j,ϕ[x(s),j]=1,otherwise,ϕ[x(s),j]=0.kj(t-1)indicates the number of times that LEO satellitejhas been selected by IoT devices to offload tasks in the pastt-1 time periods.{2 lnt/[kj(t-1)]}1/2denotes the one-side confidence interval width of the empirical average revenue,which reflects the uncertainty degree of the average revenue,i.e.,the degree of exploration.If an IoT device offloads tasks to an LEO satellite more times,the uncertainty degree of average revenue is smaller,conversely,the uncertainty degree of average revenue is larger.is the upper bound of the confidence interval,which is also the estimate of revenue.Since less selected LEO satellites and LEO satellites with higher average revenue are more likely to be selected,can make a compromise between exploration and exploitation.Furthermore,as the time period increases,the confidence interval gradually narrows and the estimated value becomes closer and closer to the true value.
Since all IoT devices need to make a unified task offload decision,we add devices cooperation into the UCB algorithm.Firstly,for each LEO satellitej,IoT deviceican obtain(t) respectively.Next,IoT deviceicompares all(t),thus finding the LEO satellite corresponding to the maximum value,i.e.,wi(t)=argmaxj(t),j ∈M.IoT deviceiregardswi(t) as the LEO satellite that it wishes to select.Then,all IoT devices’wi(t)are aggregated,thus counting the number of eachwi(t).Finally,the mostwi(t)is taken asx(t).The above is the devices cooperation in thetth time period.
The detailed procedure of the task offloading decision algorithm is shown in Algorithm 1.At the beginning,the initial values,i.e.,(0) andkj(0) are set to 0.Next,in a time period,if at least one LEO satellite has not been selected by IoT devices,all IoT devices offload tasks to one of these LEO satellites for computing.The corresponding task completion delay can be observed.Then,the historical average revenue of the LEO satellite and the number of times that IoT devices select the LEO satellite to offload tasks are updated based on the task offloading decision of the current time period.The next time period begins.After each LEO satellite has been selected at least once,each IoT device calculates the utility function of each LEO satellite to estimate the revenue of each task offloading decision.Each IoT device selects the LEO satellite with maximum utility function.The selections of all IoT devices are summarized and the number of each LEO satellite selected by IoT devices is counted.The LEO satellite selected by most IoT devices is the optimal choice.The optimal task offloading decision for the current time period is obtained.After the task offloading decision is made,the task completion delay can be observed.Then,the historical average revenue of the LEO satellite and the number of times that IoT devices select the LEO satellite to offload tasks are updated based on the task offloading decision of the current time period,as follows:
The next time period begins.For LEO satellites that are not selected in each time period,the historical average revenue and the number of times that they are selected by IoT devices to offload tasks are the same as those in the previous time period.After a certain number of cycles,IoT devices can make the optimal task offloading decisionx(t)in each time period,i.e.,(t).It should be noted that in order for each LEO satellite to be selected by IoT devices at least once,T >Mmust be satisfied.
In each time period in Algorithm 1,the complexity of updating(t) andkx(t)(t) in Line 6 isO(N).The complexity of calculating(t) of each LEO satellitej,j ∈M,in Line 10 isO(NM).The complexity of updating(t) andkx(t)(t) in Line 16 isO(N).There areTtime periods.Therefore,the complexity of the algorithm isO(TNM).
The serial number of the LEO satellite selected by all IoT devices is denoted byj*.From the perspective of LEO satellites,if we assume that(t) is given,i.e.,li,j*(t)=1,li,k(t)=0,t ∈T,i ∈N,k ∈M{j*},Pcan be transformed into
where(13a)is the objective function.(13b)means that the computing resources allocated by LEO satellitej*to the tasks in thetth time period cannot exceed its computing resources.(13c)means thatfi,j*(t)is nonnegative and has an upper limit.
Since the computing resource allocation of LEO satellitej*in different time periods is independent of each other,minimizing the average sum task completion delay of all IoT devices over all time periods is equivalent to minimizing the sum task completion delay of all IoT devices in each time period.P2 can be decomposed intoTindependent subproblems,where thetth subproblem,i.e.,the computing resource allocation of LEO satellitej*in thetth time period can be expressed as
The sum in (14) is denoted byuS(t).∂uS(t)/∂fi,j*(t)=-qai(t)/[fi,j*(t)]2,the first derivative is less than 0.∂2uS(t)/∂[fi,j*(t)]2=2qai(t)/[fi,j*(t)]3,the second derivative is greater than 0.∂2uS(t)/[∂fi,j*(t)∂fl,j*(t)]=0,i,l ∈N,,the second mixed partial derivative is equal to 0.Hence,the Hessian matrix ofuS(t)is positive definite anduS(t) is a convex function.P2.1 can be solved by the Lagrange multiplier method.The Lagrange function is constructed as
whereλ(t) is the nonnegative Lagrange multiplier.The Karush-Kuhn-Tucker (KKT) conditions are listed as
By solving the KKT conditions,we can obtain the closed-form solution ofP2.1,which is also the closedform solution ofP2.The computing resource allocation algorithm enables LEO satellitej*to calculate the corresponding(t)for each task according to(20)in thetth time period.The satellite computes these tasks and then sends the result data to IoT devices.
The above mentioned task offloading decision algorithm based on the devices cooperation aided UCB algorithm and the computing resource allocation algorithm based on the Lagrange multiplier method form our proposed joint optimization scheme of offloading decision and resource allocation,which can solvePand is denoted by DCUCB-LM.The detailed procedure of DCUCB-LM is shown in Algorithm 2.
The scheme can be divided into two phases.In the first phase,there is at least one LEO satellite that has not been selected by IoT devices.IoT devices select one of these LEO satellites to offload tasks.The selected LEO satellite allocates the optimal computing resources to tasks for computing and then sends the result data to IoT devices.IoT devices can observe the corresponding task completion delay.The historical average revenue of the LEO satellite and the number of times that IoT devices select the LEO satellite to offload tasks are updated based on the task offloading decision of the current time period.The next time period begins.
In the second phase,each LEO satellite has been selected at least once.Each IoT device calculates the utility function of each LEO satellite to estimate the revenue of each task offloading decision and selects the LEO satellite with the maximum utility function.The selections of all IoT devices are summarized,thus counting the number of times each LEO satellite is selected.The LEO satellite corresponding to the maximum number of times is the optimal choice.IoT devices offload tasks to the LEO satellite.The selected LEO satellite allocates the optimal computing resources to tasks,thus computing the tasks.The selected LEO satellite sends the result data to IoT devices.IoT devices can observe the corresponding task completion delay.The historical average revenue of the LEO satellite and the number of times that IoT devices select the LEO satellite to offload tasks are updated based on the task offloading decision of the current time period.The next time period begins.
For LEO satellites that are not selected in each time period,their historical average revenue and the number of times that they are selected by IoT devices to offload tasks are the same as those in the previous time period.After a certain number of cycles,DCUCB-LM can help IoT devices make the optimal task offloading decision in each time period through online learning.Moreover,DCUCB-LM can help the selected LEO satellite make the optimal computing resource allocation.The task completion delay can be observed.SoUcan be calculated.
Due to the uncertain or unknown network state information,the task offloading decision made by IoT devices in each time period is not necessarily statistically optimal.In addition to the average sum task completion delay,we also select regret and optimal decision ratio as metrics to evaluate the performance of the scheme.
In thetth time period,after the tasks are completed,the regret is expressed as
In thetth time period,the optimal decision ratio is represented as
The numerator is the number of time periods that IoT devices make the statistically optimal task offloading decision.The denominator is the total number of time periods.The larger the optimal decision ratio,the more time periods the IoT devices offload tasks to the satellite corresponding to the statistically optimal solution for computing in all the time periods,that is to say,the better the performance of the scheme.
In this section,we list the simulation parameters and compare DCUCB-LM with other baseline schemes to evaluate the performance.We also show and discuss the simulation results.
Simulation parameters are shown in Table 2 and Table 3 [25] respectively.ai(t),Fj,andθi,j(t) obey the random distribution.The Rice factorKobeys the normal distribution,µKis the mean andσKis the standard deviation,and they are related toθi,j(t).The main references for the parameter setting are[5,16,21,25,26].
Table 2.Simulation parameters.
Table 3.Simulation parameters.
The distance between an IoT device and an LEO satellite changes over time.This can be reflected in the changes ofθi,j(t).We assume that the elevation angle between IoT deviceiand LEO satellitejhas the same change value in each time period,which is denoted by Δi,j.There isθi,j(t+1)=θi,j(t)+Δi,j.In order to achieveθi,j(t)∈[10°,90°],so as to conform to the value range of Table 3.We formulate the following rules.In thetth time period,whenθi,j(t)=90°andθi,j(t+1)>90°,the sign of Δi,jis changed to a minus sign.Whenθi,j(t)=10°andθi,j(t+1)<10°,the sign of Δi,jis changed to a plus sign.
θi,j(t) is continuous,whereas only theµKandσKof discrete angle values are given in Table 3.To get theµKandσKof continuous angle values,we look up Table 3 according to the result thatθi,j(t) is rounded to the tens place.That is,theµKandσKofθi,j(t)∈[10°,15°) correspond to those ofθi,j=10°,theµKandσKofθi,j(t)∈[15°,25°)correspond to those ofθi,j=20°,theµKandσKofθi,j(t)∈[25°,35°)correspond to those ofθi,j=30°,...,theµKandσKofθi,j(t)∈[85°,90°]correspond to those ofθi,j=90°.In this way,the value ofKcorresponding toθi,j(t)can be calculated according to Table 3.
To validate DCUCB-LM,we compare it with other baseline schemes.The first scheme offloads tasks randomly and allocates computing resources by using the Lagrange multiplier method,i.e.,IoT devices select an LEO satellite for task offloading randomly,and the satellite allocates optimal computing resources to each task(Rand-LM).The second scheme offloads tasks by using the devices cooperation aided UCB algorithm and makes an average allocation for computing resources,i.e.,IoT devices select an LEO satellite for task offloading by using the devices cooperation aided UCB algorithm,and the satellite allocates the same computing resources to each task(DCUCB-Ave).The third scheme offloads tasks by usingϵ-greedy algorithm and allocates computing resources by using the Lagrange multiplier method,i.e.,IoT devices select an LEO satellite for task offloading by usingϵ-greedy algorithm,and the satellite allocates optimal computing resources to each task (ϵ-greedy-LM).The fourth scheme offloads tasks by using explore-first algorithm[27] and allocates computing resources by using the Lagrange multiplier method,i.e.,IoT devices select an LEO satellite for task offloading by using explore-first algorithm,and the satellite allocates optimal computing resources to each task(EF-LM).Each satellite was tried 20 times.The fifth scheme is the statistical optimal scheme,i.e.,IoT devices always offload tasks to the LEO satellite that corresponds to the statistical optimal decision,and the satellite allocates optimal computing resources to each task(Opt).The sixth scheme offloads tasks by using a greedy algorithm designed according to[28]and makes an average allocation for computing resources,i.e.,IoT devices select an LEO satellite for task offloading by using a greedy algorithm,and the satellite allocates the same computing resources to each task (Greedy-Ave).It is noted that the scheme is executed when the network state information is known.In each time period,the satellite that can achieve the minimum sum task completion delay of all IoT devices is selected.All simulation results are obtained by running MATLAB 500 times and then taking the average value.
As shown in Figure 3,we choose three different values forMthat are less than,equal to,and greater thanNand simulate the variation of DCUCB-LM’s regret over time periods.It can be seen that the regret corresponding to all values ofMis sublinear as the time period increases.Although all regret has increasing trends,the increasing speed gradually slows down and eventually stabilizes.This is especially evident whenM=5.The results demonstrate that DCUCB-LM is effective in enabling IoT devices to make appropriate task offload decisions.Moreover,the smallerM,the smaller the regret and the faster it levels off.This is because DCUCB-LM starts with one attempt for each task offloading decision,and the fewer candidate task offloading decisions,the fewer attempts.In addition,reducing the number of candidate task offloading decisions reduces the unknown network state information,thereby shortening the time for IoT devices from the beginning of learning until they are able to make appropriate task offloading decisions.To speed up convergence and facilitate simulations,we chooseM=5 in the following simulations.
Figure 3.Variation of regret corresponding to different M with time periods.
As Figure 4 depicted,the average sum delay of DCUCB-LM,DCUCB-Ave,andϵ-greedy-LM gradually decreases and finally becomes stable with the increase of the time period.The average sum delay of Rand-LM fluctuates initially and then stabilizes rapidly.The average sum delay of EF-LM increases,then decreases,and finally becomes stable.The average sum delay of Opt does not change over time periods,because it is the statistically optimal result.Since Rand-LM is completely random in terms of task offloading decisions,and the UCB algorithm does not have randomness at all,but relies on data calculation to make task offloading decisions,the average sum delay of DCUCB-LM is much lower than that of RANDLM.Moreover,the average sum delay of DCUCBLM is lower than that of DCUCB-Ave,and the difference between them first increases and then approaches stability.This is because DCUCB-LM optimizes the computing resource allocation,thus reducing the delay that the LEO satellite computes tasks.DCUCB-LM does not have randomness andϵ-greedy-LM has some randomness,so the average sum delay of DCUCBLM is much lower than that ofϵ-greedy-LM.Compared with EF-LM,DCUCB-LM has lower average sum delay,because DCUCB-LM takes better account of historical revenue.For the above reasons,the average sum delay of DCUCB-LM is the closet to that of Opt.This demonstrates that DCUCB-LM can reduce the average sum task completion delay significantly.
Figure 4.Variation of average sum delay with time periods.
As shown in Figure 5,the regret of all schemes increases as the time period increases.This is because all schemes have exploration processes that may offload tasks to nonoptimal LEO satellites.The regret of DCUCB-LM is sublinear.This indicates that DCUCB-LM can quickly understand the previously unknown network state information through online learning,thereby obtaining the statistical optimal solution.Since Rand-LM and EF-LM do not have the ability to learn,their regret increases linearly.The regret ofϵ-greedy-LM is also linear.Both Rand-LM andϵ-greedy-LM have randomness,whereas DCUCB-LM does not,so the regret of Rand-LM andϵ-greedy-LM is much greater than that of DCUCB-LM.The regret of EF-LM is much greater than that of DCUCB-LM,because EF-LM does not take full advantage of historical revenue.DCUCB-Ave does not optimize computing resource allocation,so its regret tends to be a little sublinear and is much greater than that of DCUCBLM.The regret of DCUCB-LM is the smallest among all schemes,which proves that DCUCB-LM can properly balance exploration and exploitation,thus effectively reducing the performance losses caused by the unknown network state information.
Figure 5.Variation of regret with time periods.
As Figure 6 depicted,with the increase of time period,the optimal decision ratio of DCUCB-LM,DCUCB-Ave,andϵ-greedy-LM increases gradually,and the increasing speed slows down gradually.This indicates that IoT devices make statistically optimal task offloading decisions in more and more time periods.The optimal decision ratio of DCUCB-LM and DCUCB-Ave can reach about 77% and 78%,respectively.This validates that DCUCB-LM can obtain better results after a short period of learning.Although the optimal decision ratio of DCUCB-LM is very close to and slightly smaller than that of DCUCBAve,DCUCB-LM can achieve lower average sum delay and smaller regret.This proves the effectiveness and reliability of DCUCB-LM.ϵ-greedy-LM stabilizes faster than DCUCB-LM,but its optimal decision ratio is up to about 65%,which is much smaller than that of DCUCB-LM,becauseϵ-greedy-LM has some randomness.The optimal decision ratio of Rand-LM is almost constant,which is about 20%,because in random offloading,the probability that a task is offloaded to each LEO satellite is the same,which is equal to the inverse of the number of satellites,which is 1/5.Furthermore,the optimal decision ratio of Rand-LM is much smaller than that of DCUCB-LM,because Rand-LM does not make appropriate task offloading decisions.The optimal decision ratio of EFLM initially decreases,then increases,and finally stabilizes at about 24%,which is much smaller than that of DCUCB-LM.This suggests that appropriate task offloading decisions cannot be made based on initial exploration alone.In the whole process,a proper balance between exploration and exploitation is essential,and DCUCB-LM can do just that.
Figure 6.Variation of optimal decision ratio with time periods.
In order to further confirm the effectiveness of DCUCB-LM,we perform simulations of DCUCB-LM with three different values ofN.The simulation results are as follows:
As shown in Figure 7,the average sum delay corresponding to the three values ofNdecreases gradually at first and then becomes stable with the increase of time period.This proves that DCUCB-LM can help IoT devices make optimal task offloading decisions through online learning,thereby reducing the average sum delay successfully.Moreover,the largerN,the higher the average sum delay.This is because for each additional IoT device,there is one more task to be computed,which increases the corresponding task completion delay.
Figure 7.Variation of average sum delay corresponding to different N with time periods.
As depicted in Figure 8,as the time period increases,the regret corresponding to the three values ofNis sublinear.This indicates that the online learning of DCUCB-LM can find the optimal task offloading decision effectively,thus reducing the losses.Furthermore,the largerN,the greater the regret.The reason is that each IoT device has task completion delay.When IoT devices do not make statistically optimal task offloading decisions,the more IoT devices,the greater the cumulative losses.We can also observe that the smallerN,the faster the increasing speed slows down.
Figure 8.Variation of regret corresponding to different N with time periods.
As shown in Figure 9,with the increase of time period,the optimal decision ratio corresponding to the three values ofNincreases gradually,while the increasing speed decreases gradually.This demonstrates that DCUCB-LM can obtain the statistically optimal task offloading decision effectively.Moreover,the largerN,the larger the optimal decision ratio.WhenNis 10,15,and 20,the optimal decision ratio can reach 77%,83%,and 86%,respectively.This confirms that DCUCB-LM can obtain quite good results for differentN.
Figure 9.Variation of optimal decision ratio corresponding to different N with time periods.
It can be summarized as follows.Regardless of the value ofN,the proposed scheme can find and make the optimal task offloading decision.The difference is that the smaller the value ofN,the smaller the delay increase due to each non-optimal task offloading decision of the proposed scheme.The larger the value ofN,the faster the proposed scheme can find and make the optimal task offloading decision.
In addition,we compare DCUCB-LM with Greedy-Ave.DCUCB-LM is for continuous time,while Greedy-Ave is for a single moment.Therefore,we perform Greedy-Ave in each time period and obtain the corresponding results,which are then averaged.
As depicted in Figure 10,the delay of both schemes increases asNincreases.The delay of DCUCB-LM is lower than that of Greedy-Ave.Moreover,the difference between them gradually increases.This shows that DCUCB-LM achieves good results for the joint optimization of offloading decision and resource allocation.It should be noted that DCUCB-LM considers the case where the network state information is unknown,while Greedy-Ave considers the case where the network state information is known.The results confirm the effectiveness of DCUCB-LM.
Figure 10.Variation of average sum delay with N.
In this paper,we research that LEO satellites assist IoT devices in computing tasks in MEC-enabled STNs.Specifically,multiple IoT devices select one of LEO satellites with MEC servers to offload tasks in each time period in a rural area without terrestrial network coverage,thus minimizing the average sum task completion delay.The optimization problem is formulated and decomposed.We propose a joint optimization scheme of offloading decision and resource allocation,which consists of a task offloading decision algorithm based on the devices cooperation aided UCB algorithm and a computing resource allocation algorithm based on the Lagrange multiplier method.Although the network state information is uncertain or unknown,the proposed scheme enables IoT devices to make the optimal task offloading decision through online learning.The scheme also helps the selected LEO satellite to make the optimal computing resource allocation for tasks.Simulation results demonstrate that our scheme has lower average sum task completion delay,smaller regret,and larger optimal decision ratio than other baseline schemes.In addition,our scheme achieves quite good results regardless of the number of IoT devices.In the future,we plan to consider scenarios where IoT devices offload tasks to different LEO satellites.
ACKNOWLEDGEMENT
This work was supported by National Key Research and Development Program of China(2018YFC1504502).
APPENDIX
Here we prove thatPis NP-hard.First,Pcan be transformed intoP3 by setting the variablefi,j(t) to be constant.
Then,P3 can be relaxed intoP4 by dropping the constraint(6c).
P4 is a generalized assignment problem (GAP).GAP is NP-hard [29],soP4 is NP-hard.SinceP3 has one more restriction,i.e.,(6c)thanP4,P3 can be seen as a special case of the GAP.It follows thatP3 is NP-hard.Pneeds to optimizefi,j(t) on the basis ofP3,so the computational complexity ofPis greater than that ofP3.Therefore,Pis NP-hard.