Zeyu Hu,Chunjing Hu,Zexu Li,Yong Li,Guiming Wei
1 Key Laboratory of Universal Wireless Communications,Beijing University of Posts and Telecommunications,Beijing 100876,China.
2 China Academy of Information and Communications Technology,Beijing 100191,China.
Abstract:The concept of edge network caching has been proposed to alleviate the excessive pressure on the core networks.Furthermore,video segment caching technology,a method to cut the whole video into segments and cache them separately,has brought a novel idea to solve the caching problem in the smaller space for massive data.The adoption of segment caching in edge networks will divide the simple video transmission process into two coupling stages because of separate data caching,which leads to more complicated resource allocation.In this paper,this problem is discussed,and its mathematical model is established to minimize the energy consumption of video transmissions.By introducing an efficient prediction window of channel fading,an optimal dynamic scheduling algorithm based on Qlearning is proposed to minimize power consumption while ensuring smooth video streaming.The proposed Q-learning algorithm is simulated and the impacts of channel state,target video bit rate and largescale channel parameter are evaluated.Simulation results show that the proposed method can effectively reduce the total power consumption while ensuring the smooth playback of video service,thanks to the fact that the proposed method is intelligent which can effectively utilize idle resources in favorable channel states.
Keywords:radio access networks;streaming media;quality of service;machine learning
With the rapid development of wireless network technologies,people are accustomed to completing various social activities through online video.Due to the demand for smooth video streaming in various fields,such as journalism and education,the video data exhibit explosive growth trends in wireless Internet.The applications of video streaming media occupy the vast majority of the total network traffic and have gradually become the mainstream applications in the Internet.According to the forecast,the data generated by video traffic will triple by 2021 [1].This is a great burden for wireless communication networks.
In order to alleviate the problem of heavy network load caused by transmitting large amounts of streaming media data,the technology of edge caching is proposed as an effective solution [2].By using edge caching,popular contents are cached upfront in edge devices so as to reduce data transmissions in the core network.Compared with the traditional centralized caching at a few central servers,edge caching can relieve the load pressure of the core network and improve the performance of the communication network in energy consumption[3].Fog radio access network(F-RAN) is one of the architectures which can utilize the caching and computing capabilities of edge network nodes to improve the overall network performance[4].It is the practical evolution direction of heterogeneous cloud radio access networks (H-CRANs)[5].Compared with the traditional cloud radio access networks(C-RANs),H-CRAN is known as a common variant of the C-RAN,which can decrease the heavy burden on the fronthaul and avoid the large-scale/high real-time requirement of wireless signal processing in the baseband cell pool of traditional C-RANs.FRANs not only inherit the advantages of H-CRANs,but also provide mixed centralized and distributed deployment of the cache capability,which enable more flexible and efficient caching strategies.
In traditional edge caching strategies[6],cache contents are often considered as a whole.However,the caching space of the F-RAN access point (F-AP) is smaller than traditional edge devices[7].Pre-caching of large contents (e.g.,a complete video) results in a low cache hit ratio,due to the limited cache space size.In order to increase the possibility of matching the cached contents in edge node with user requests,Yueet al.proposed to cut the whole video into segments and cache them in different cache spaces [8].This method has proved to be able to improve system performance of femtocell networks and reduce the backhaul overhead[9].Furthermore,the service burden of macrocell base stations can be reduced,because it can reuse the content segments in the complete video flexibly.In this way,the idle resources and time are used efficiently,and the video segment based caching strategy is a more flexible selection of cached content and is shown to be effective in reducing both video delay[10]and energy consumption [11].Fenget al.have done the work on pre-caching [12],which takes advantage of pre-cached data for video-on-demand services,and the cache allocation is optimized to minimize the average bandwidth consumption.In[13],the cooperative strategy of pre-caching and transmission was studied for the proactive system where videos are periodically broadcasted.
By introducing the aforementioned video segment caching strategies into F-RANs,the front part of the video is cached in the F-AP close to the user equipment(UE),and the rear part is cached in the core network.In this paper,caching in the core network can be simplified to caching in the base stations(BSs),because only the access network is discussed.Unlike the traditional strategy of caching complete videos only in either the F-AP or the BS,video segment based caching provides a method for caching video in segments.In this method,the process of requesting and acquiring a complete video is decomposed into two stages.In the first stage,the front part of the content requested by the UE can be obtained directly when it is cached in F-AP.In the second stage,the remaining contents requested by the UE are firstly obtained from the BS by the F-AP and then transmitted to the UE from the F-AP.Both stages rely on wireless transmission and share the same set of subcarriers.It will apparently lead to increased complexity for the network power allocation problem,which is caused by the competition between the BS-to-F-AP link and the F-AP-to-UE link.The main contributions of our work are summerized as follows:
• In this paper,a complete two-stage video transmission strategy is discussed in the context of FRANs,and the mathematical model is established to minimize the energy consumption of the system.By analyzing the constraints of video data in each node of the system,the optimal method is proposed to ensure the smooth video streaming for UE.
• The prediction time window of channel fading is introduced.Because resource allocation is dynamic and environment model is unknown,a subcarrier allocation strategy based on Q-learning is proposed.According to the channel state in a certain period in the future,the power allocation strategy is formulated in the time window to optimize the energy consumption of the system dynamically.
• For the strategy mentioned above,the algorithm is simulated and the impacts of system parameters such as channel state,video bit rate and distance are evaluated to compare the effects.The applicable scenarios of the proposed algorithm are thus given.
The rest of this paper is organized as follows.In Section II,we present the mathematical model of the system.In Section III,we transform the original static optimization problem into a dynamic problem which can be solved by our proposed video segment caching method based on Q-learning.In Section IV,the simulation results under different simulation scenarios are compared.Finally,we summarize our work in Section V.
Figure1.Frames of the former part cached in the F-AP need only one hop for transmission,and other frames need two hops for transmission.
The system model is shown in Figure1.The UE cannot be connected to the BS directly,and it needs to rely on the F-AP to obtain video content.It is assumed that a complete video is composed of a plurality of frames which are of equal time length.Each frame is treated as a minimum unit of time which is represented by a time unitπ,and the segment caching strategy is applied in this system model.Assume that some previous frames of the complete video are cached in the FAP which is the requested data for the UE.The F-AP needs to fetch the rest of the uncached portions from the BS.Therefore,frames of the former part cached in the F-AP need only one hop for transmission,while others of the latter part need two hops for transmission.In order to avoid interference,the subcarrier resources between the two hops are orthogonal.The decisionmaking process of resource allocation becomes more dynamic due to unknown channel states and the competition of shared resources.The online scheduling strategy is adopted due to the variability of decisionmaking,which is carried out at the beginning of each allocation time unit.
In the aforementioned system model,the problem is how to allocate subcarrier resources between the BSto-F-AP and the F-AP-to-UE links efficiently.It is required that the video streaming should be smooth for UEs,and the optimization goal is to minimize the total power consumption of the whole system.
Assuming thatMis the number of subcarriers available in the system,andBcis the bandwidth of each subcarrier.Then the total bandwidth isB=M ·Bc.The maximum powerof the F-AP is divided intoMparts,and we haveSimilarly,the power of the BS is also divided intoMparts,andAssume that each channel experiences block-flat fading during each time unitπ.The fading power of the BS-to-F-AP and the F-APto-UE links are represented byγAPmandγBSmover themthsubcarrier andnthtime unit,respectively.N0is the power spectral density of the background Gaussian noise.Assuming that themthsubcarrier in thenthtime unit is allocated for the F-AP-to-UE link,the channel capacity is
During thenthtime unitπ,the data transferred by the F-AP to the UE is
where the value ofαn,mis either 1 or 0 to indicate whether this subcarrier is allocated to transmit data for the F-AP-to-UE links.Similarly,if themthsubcarrier in thenthtime unit is allocated to transfer data from the BS to the F-AP,the channel capacity and the data transferred in the periodπare respectively
whereβn,mis similar toαn,m,which is used to indicate whether this subcarrier is allocated to transmit data for the BS-to-F-AP links.It is noteworthy thatαn,m+βn,m ≤1,due to the constraint that the same subcarrier cannot be used for both the BS-to-F-AP and the F-AP-to-UE links.
LetV(n) denote the data consumption for the UE to play thenthframe by the video player,and the cumulative data consumption of the UE from the initial time to the timetis thus given byand the total data amount of the video is a constant given byThe maximum cache space of the UE is set to a fixed valueVmax.
In conclusion,the condition for a user to be able to watch a video with a total length ofNframes smoothly is given as follows
Eq.(5)shows that the data sent by the F-AP is greater than the data consumed by the UE at all time units.Eq.(6) shows that the difference between the data from the F-AP and the data consumed by the the UE is less than the cache limit of the UE at all times.
As shown in Figure2,in order to ensure the UE experience,the cached data in each frame should be guaranteed above the data consumed by the UE and below the maximum cache space.Therefore,the cached data curveshould be strictly limited within the area between the two lines which arerespectively.
Figure2.Video transmission strategy for F-AP-to-UE links.
Figure3.Video transmission strategy for BS-to-F-AP links.
Assuming that there areλpre-cached video frames in the F-AP,represented byandλis a constant.The constraints of sufficient data in the FAP to ensure smooth video streaming are as follows
The optimization objective is to minimize power consumption as much as possible.By combining (5),(6) and (7),the conditions for smooth video streaming can be derived.In addition,the connection of data volume and power consumption is considered according to (1),(2),(3) and (4).The mathematical model of the optimization problem is obtained as follows
Eq.(8) is a mixed integer non-linear programming problem,and traditional search algorithms cannot solve such a complex optimization problem at a reasonable time cost.In order to solve the above problem,an online scheduling strategy is designed.
Firstly,the whole optimization process of video transmission needs to be divided into a set of independent subproblems per frame;that is,the video transmission is decomposed intoNtimes of optimization.To simplify the formulas,the time unit takes the constant unity.
During the per-frame optimization,assume thatandcan be considered as constants due to the short time unit and stationary fading power.Therefore,a 0-1 matrix,as the subcarrier allocation matrix,is the output in each frame.After each subcarrier allocation,the state change of the data will affect the state of the next frame,and the solution will be iterated repeatedly.The state of subcarrier allocation is deterministic for the same subcarrier at the same time.There is only one exact allocation state,and the subscriptiinθn,m,iindicates the state of subcarrier allocation,and the value space is 3 which is used to represent the cases as follows:
• The subcarrier is allocated to the F-AP-to-UE link;
• The subcarrier is allocated to the BS-to-F-AP link;
• The subcarrier is idle.
For notational compactness,θn,mis denoted as the result of the resource allocation.For example,θnn,m,mn,m={1,0,0}indicates that themthsubcarrier is allocated to the F-AP-to-UE link innthtime unit,thenθnn,m,mn,m={0,0,1}indicates that themthsubcarrier is idle innthtime unit,and so on.
To sum up,the optimization problem(8)can be reformulated into the following forms
Eq.(2.2)shows that the problem can be transformed into a 0-1 integer programming problem.The solution of 0-1 programming problem is complex,and the common methods include combination method and heuristic algorithm[14].The combination algorithm can obtain the global optimal solution,but it has the shortcoming of large search space;whereas the heuristic algorithm has a characteristic of faster convergence,but it tends to fall into local optimal value,and will be affected by initialization parameters.The enumeration algorithm complexity is in proportion toθM.When the value ofθandMare relatively small,traditional methods such as enumeration method can solve this optimization problem in acceptable time.
Channel prediction is a well established technology in the field of wireless communication [15].This technology is the theoretical basis for wireless systems to properly distribute power for different time periods according to the prediction results in a certain time window.In this paper,UEs in high-speed motion are not considered,and the length of the time prediction window is short relative to the random behavior of UEs.According to the channel prediction technology,all channel states from the current frame to the nextδframes can be obtained accurately.
If channel prediction technology is applied to(2.2),and the traditional enumeration method is used to solve it,then the algorithm complexity will be as high asθM·δ.Therefore,a new model needs to be to built to solve the dynamic subcarrier allocation problem,which is based on channel prediction technology.
Assuming that the channel is a frequency-flat channel,in each time unit,the influence of channel differences between different subcarriers can be ignored[16].Only one F-AP and one UE are considered in the scenario of this paper.Therefore,the original problem of which subcarriers are allocated in each allocation period is simplified into the problem of how many subcarriers are allocated.Based on this assumption,(8)can be transformed into the following forms
Figure4.State transition process of dynamic problems.
whereζn ∈{0,1,2,3,...}andηn ∈{0,1,2,3,...}represent the number of subcarriers allocated to the FAP-to-UE link and the BS-to-F-AP link during thenthtime unit,respectively.The current moment of the first frame is seen as a start,then the allocation from the current moment to the nextδframes is seen asδ −1 state transitions,as shown in Figure4.At each moment,the system is in a certain state,that is,the position represented by a grid in Figure4.This state haspotential transition states,and the transfer will be completed in the next frame.
During the process of selecting the next state for each frame,the data cached in the F-AP and the UE in the previous frame will affect the cache decision at the next moment.The ultimate goal of optimization is to ensure that the system tends to cache more data when the channel state is favorable,so that it can ensure smooth video streaming of the UE during the time with poor channel state.Thus,the total power consumption of the system can be reduced.Eq.(10) can be understood as a dynamic programming process due to time-varying characteristics of the channel.Considering the convergence time of the algorithm,the traditional enumeration method cannot solve this problem in reasonable time.
Machine learning has proved to be an effective tool to deal with wireless resource management[17].Since it is impossible to obtain a large number of sample data sets in this problem,reinforcement learning is suitable for solving this problem by exploring the environment for rewards or punishments[18].Moreover,since the environment model for this problem is unknown,it is necessary to obtain the optimal set of actions in a series of consecutive actions.Therefore,Q-learning,the representative of temporal-difference learning algorithms,is adopted here because it is more suitable for solving this problem among many reinforcement learning algorithms.Q-learning based segments planning(QSP)algorithm is a kind of allocation algorithm based on the idea of reinforcement learning,which optimizes the action strategy in the process of continuous trial-and-error and is expected to obtain the maximum cumulative reward.
?
In this algorithm,the start of the complete time window is the frame of the current moment,and the end is the frame which isδframes after the current moment,and the Markov decision process is constructed within this window with one frame as the unit.The data volumes cached in the current F-AP and the UE are regarded as the states,and differentζnandηnare regarded as the actions for each frame in the unit,and it transfers to the corresponding state of the next action according to different actions.The setting of reward is related to the power consumed in the allocation for the subcarrier.In the constraints,the reduced F-AP energy consumption should be accorded larger rewards by the Q-table,and the reward value will become punishment when the constraint condition is unsatisfied.
The core idea of the QSP algorithm is to combine the process of selecting how much data to transfer and cache in each frame into a time window in chronological order,and formulate a complete optimal strategy for the current window at the moment of the first frame in the window.
The Q-learning algorithm here is to find out the best state transition process in Figure4.Therefore,the data in the state table represented bySare used to describe the cumulative number of subcarriers allocated for the F-AP-to-UE link and the BS-to-F-AP link,and the data in the action table represented byAare used to describe the number of subcarriers allocated in this time unit.At the first moment of the distribution window,the F-AP tries different distribution strategies with equal probability,and then accumulates the reward.In this paper,reducing the overall power is the goal of the optimization.The value of the reward is inversely proportional to the power consumption in this time unit when the allocation strategy is feasible,and it is denoted byR.In addition,Rmay also be a negative value when the allocation strategy cannot meet the constraints.The Q-table is updated according to the cumulative reward values obtained by different actions in the environment,and the value of the Qtable is related to the probability of different actions to be tried in the current state.The larger the cumulative reward,the greater the probability that the state will be transferred to.When the Q-table meets the convergence condition,the optimal allocation strategy can be considered as the best strategy in this time window.The detailed algorithm is shown in Algorithm 1.The aim of the algorithm is to avoid falling into the local optimal value and wasting resources by searching for too many useless states.
When the QSP algorithm is applied,there is a problem that the smooth transition cannot be guaranteed in the transit between two adjacent time windows.For example,if the channel condition of the first frame in the next time unit is poor,the F-AP has to allocate high power to ensure the smooth video streaming of the UE.The reason is that the F-AP in this time unit cannot prepare enough storage for next unit in advance.They are selfish for each window of time,and they use the maximum reward in their window as a policy guideline,so that no data will be allocated to the next time windows.
For this problem,a Q-learning based gradual improvement (QGI) algorithm is proposed to improve QSP.The algorithm enables the previously fixed time window to slide.Each time unit,according to the channel status within this time window,is regarded as the allocation condition.After successful allocation,it will slide one frame size forward.Then,the time window from the current frame to the next frame is regarded as a new time window for allocation.The comparison of window behaviors between QSP and QGI is shown in the Figure5.
Figure5.Differences in window behaviors between QSP and QGI.
In the previous allocation window,the policies from the 2ndframe to theλthframe are all discarded,and the state obtaining in the new allocation window is implemented.QGI can adjust the allocation gradually,and thus can solve the problem of unsmooth connection between two adjacent windows in QSP.The reason is that when applying QGI algorithm to solve the problem,there will not be two adjacent connections between multiple time windows.Nested time windows are connected with each other.Fundamentally,it solves the problem that the previous time window does not pre-cache data for the later time window.
However,QGI algorithm will produce more time windows and thus is more computationally expensive than QSP.As the example given in Figure5,the number of calculation times for QGI is 6,but the number for QSP is only 2.The QGI algorithm results in more computational cost.Therefore,QSP is a more practical strategy than QGI in some scenarios because of the performance limitation of the F-AP.
In order to verify the the performance gain of the proposed algorithm in different scenarios,the system simulation model based on a F-RAN framework is built to simulate the enumeration methods for 0-1 linear programming and the two proposed Q-learning algorithms (QSP &QGI).The simulation parameters are listed in Table1.For the convenience of calculations,Bc,N0,πandV(n) are normalized.In this way,the amount of data that contains one frame of video is equal to that transmitted by one subcarrier in one time unit,when the unit for the parameters are all equal to 1.
Figure6.The number of iterations to convergence in Qlearning is counted as learning efficiency.
The enumeration method for 0-1 linear programming optimization is not suitable in the dynamic allocation during multiple frames,because the time complexity is unacceptable.The number of iterations required for the enumeration method can be calculated as625 by several related parameters applied in Table1.Therefore,the enumeration method is only satisfied for one time unit,in the other words,the enumeration method can be identified as on-demand allocation method for UE requests.
If the enumeration method is used to solve this 0-1 linear programming optimization,the process of dynamic allocation of multiple frames will be extremely complex,and the time complexity is unacceptable.The simulation parameters are given in Table1.The number of iterations required for the enumeration method to take effect can be calculated as100,625.Therefore,the enumeration method is only satisfied for 1 time unit,this is,the method can be identified as on-demand allocation for UE requests.
The learning efficiency of the Q-learning is higher than that of the enumeration method.As shown in Figure6,the average convergence times of applying Q-learning algorithm is 3,624 times,far less than 4 million times of the enumeration method.A number of the convergence values are high relatively due to the influence of randomness,but most of them are converged within 5,000 times.
Three different scenarios are designed to verify the performance comparison of the proposed algorithm in the corresponding situation.First of all,the timevarying characteristics of the channel are considered.Secondly,the video bit rate of the UE is considered to be related to the performance of the algorithm.Finally,the impact of the UE information about the location of the F-AP and the UE on system energy consumption is verified.
Figure7.The influence of channel state coefficients in 3 different schemes.
Simulation results demonstrate the effect of timevarying characteristics on the proposed algorithm.Channel state coefficient is used to describe the variation of the channel,and its definition is given by.
whereµrepresents the time-varying coefficient of channel state,and the value range is 0<µ2<1.The smaller the value ofµ2,the more stable the channel state is over time.The statistical characteristics of the channel obey the complex Gaussian distribution,and the average and variance are satisfied with the statistical characteristics.The simulation results are given in Figure7.The user data consumption is set to 1 and the UE cache size is set to 7.
As can be seen from Figure7,the stronger random changes in channel,the more effective the algorithm.When the randomness of the channel is significant,the performance of the machine learning-based algorithm is obviously better than that of the 0-1 programming method in terms of energy consumption reduction.The reason behind this is that machine learning algorithms exhibit relatively better intelligence.Itcan accumulate more data in the favorable channel and transmit as little data as possible in the poor channel.When the channel is less time-varing,the possibility of favorable channel condition is lower,and the proposed algorithm lacks cache space,therefore it is closer to the traditional optimization algorithm in terms of the total power consumption.
Table1.Simulation parameters.
By comparing the power consumption curves,the QGI can reduce the total power consumption of the system more effectively.This is because when the randomness of the channel is very strong,it is more likely to increase the jump at the window boundary in QSP algorithm.In other words,the probability of poor channel in the first frame of the next window increases.In QGI algorithm,the performance is further improved by eliminating the boundary between two windows.To sum up,the intelligent algorithm based on Q-learning is more suitable for scenarios with drastic channel changes.
Figure8.Comparisons of different bit rates for UE requirements in 3 different schemes.
Simulation experiments are designed to demonstrate the effects of changes in UE consumption(i.e.,video bit rate requested by the UE) on the performance of different algorithms.In the simulation experiment,non-integer values represent the proportion of frames consumed in all sending frames.For example,UE consumption of 0.1 indicates that 1 frame of data is consumed per 10 frames,and the remaining 9 frames can be regarded as non-consumption.The value ofµ2is fixed at 0.3 in this experiment.Assume that the large-scale characteristics of the F-AP-to-BS link and the F-AP-to-UE link are the same.
Figure9.The influence of coefficients of large scale conditions in 3 different schemes.
From Figure8,the overall power consumption of both QSP and QGI is more efficient than the traditional optimization method.It can be concluded that the proposed algorithm can effectively reduce energy consumption of transmission.The higher the video bit rate requested by the UE,the greater the power saving effect of the machine learning method.This is because the higher the bit rate,the more energy the transmission consumes.In this case,the proposed algorithms have more energy benefits.Therefore,the intelligent algorithm based on Q-learning is more applicable to the scenario where the UE requires a higher video bit rate.
A simulation experiment is designed to test the largescale similarity between channel states between the FAP-to-UE link and the BS-to-F-AP link,as well as the influence on the performance of different algorithms.The large-scale characteristics of channel state are described by
whereωrepresents the coefficient of large scale condition and is used to describe the relative distance ratio of the UE-to-F-AP link to the BS-to-F-AP link.fcdenotes the frequency of carrier,andcdenotes the speed of light.They can all be regarded as constants.The higher the value ofω,the smaller the distance is.The simulation results are given in Figure9.In the simulation experiment,the consumption of UE data is assumed to be 1 andµ2is 0.3.
It can be seen from Figure9 that the proposed algorithm is superior to the traditional method of power consumption in the simulation experiment,under all reasonable values ofω.Because it costs more energy to communicate over a longer distance.The more energy can be saved by the proposed algorithm,since the performance of the proposed algorithm is better.The QGI algorithm has the lowest power consumption in comparison.This is because it improves performance by eliminating connections between windows.To sum up,the intelligent algorithm based on Q-learning is more suitable for scenarios where the UE is far away from the BS.
Based on the F-RAN architecture,this paper proposes a method to solve the power allocation problem,which combines edge caching and video segment caching technology.The mathematical model is established to reduce the total power consumption of the system,and the traditional optimization method is used to solve the problem in a single time slot successfully.However,it cannot be solved in continuous time windows.In response to this,Q-learning algorithm is used to obtain the optimal allocation strategy within the predictive time window.In addition,the problem is further improved by the channel prediction technique.Two algorithms based on Q-learning,namely,QSP and QGI,are used to obtain the dynamic power allocation strategy in the predictive time window.Simulation results show that the proposed algorithm can effectively allocate subcarriers and achieve the optimal power allocation of the system on the basis of ensuring smooth video stream services of UE.
The next step is to consider the multi access problem for the F-AP.Only single UE access to F-AP is considered here.But in reality,it is common for multiple UEs to access the same F-AP at the same time.Multi-user will bring more complex subcarrier competition problem,which will be our research direction for future work.In addition,multiple UE may access and request video data at different times.In this case,the dynamic access for UE will be further studied.
ACKNOWLEDGMENT
This work was supported in parts by the National Natural Science Foundation of China under Grant No.61671074 and No.61971067,by the National High Technology Research and Development Program(863 Program)of China under Grant 2014AA01A707,by the State Major Science and Technology Special Projects under Grant 2018ZX03001028-003,and by Beijing University of Posts and Telecommunications Excellent Ph.D.Students Foundation No.CX2020209.