Joint Design of Content Delivery and Recommendation in Wireless Caching Networks

2021-12-03 01:25ZhongyuanZhaoHuihuiGaoWeiHongXiaoyuDuanMugenPeng

China Communications 2021年11期

Zhongyuan Zhao,Huihui Gao,Wei Hong,Xiaoyu Duan,Mugen Peng,*

1State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications,Beijing 100876,China.

2Key Laboratory of Universal Wireless Communications(Ministry of Education),Beijing University of Posts and Telecommunications,Beijing 100876,China.

3Beijing Xiaomi Mobile Software,Beijing 100085,China

Abstract:Although content caching and recommendation are two complementary approaches to improve the user experience,it is still challenging to provide an integrated paradigm to fully explore their potential,due to the high complexity and complicated tradeoff relationship.To provide an efficient management framework,the joint design of content delivery and recommendation in wireless content caching networks is studied in this paper.First,a joint transmission scheme of content objects and recommendation lists is designed with edge caching,and an optimization problem is formulated to balance the utility and cost of content caching and recommendation,which is an mixed integer nonlinear programming problem.Second,a reinforcement learning based algorithm is proposed to implement real time management of content caching,recommendation and delivery,which can approach the optimal solution without iterations during each decision epoch.Finally,the simulation results are provided to evaluate the performance of our proposed scheme,which show that it can achieve lower cost than the existing content caching and recommendation schemes.

Keywords:wireless caching networks; content caching;content recommendation;deep reinforcement learning;resource management

I.INTRODUCTION

With the explosive development of mobile Internet and enhanced broadband services,such as holographictype communications,it is challenging for the future wireless networks to satisfy the extremely high quality-of-service(QoS)requirements with respect to the throughput,latency,and connections[1].As a promising technique of significantly improving the user experience,content caches have been considered to be deployed at the edge of wireless networks,and become a hot spot for both academia and industry[2].By caching content objects at the network edge devices,a part of users requests can be responded locally,which can reduce the transmission latency and balance the backhaul loadings[3][4].The development of wireless caching networks promotes the fusion of communication,computation,and caching[5][6].To explore the potential of wireless caching networks,the optimization of various performance metrics has been widely studied.In[7]and[8],the delay minimization problems have been formulated to offer a good delay performance at low communication complexity.The throughput improvement achieved by combining communications,caching and computation has been verified in[9],and[10]has also explored a joint optimization problem of content caching to offload the traffic from the content server.

Moreover,due to the tight couple of communication and caching resources,it is difficult to approach per-formance limits of wireless content caching networks efficiently.To solve this problem,the joint design of content transmissions and caching has attracted a lot of attentions.In[11],a caching scheduling scheme has been designed to achieve differentiated caching resource allocation,which avoids the waste of caching resource and enhances the cache availability.The joint optimization of caching strategy and bandwidth resource allocation has been designed in[12],[13]and[14].By developing novel algorithms based on the conventional methods such as exhaustive-search approach and game theory,they have obtained optimal solutions.In[15],joint caching,routing and channel assignment in coordinated small-cell cellular systems have been considered and achieved considerable gains in average transmission rate.The joint optimization of resource allocation and content placement has been considered in[16]for cost-efficient edge caching.Inspired by the success of deep reinforcement learning(DRL)in solving complicated control problems,[17]has presented a DRL-based framework for jointly optimizing content caching,computation offloading and resource allocation,which shows the learning capacity of DRL and reduces the end-to-end service latency.

Beside content caching,content recommendation is another research hot spot with respect to content management in the recent years,which was first proposed by the mobile applications providers to attract more users[18].Based on the historical data of user behavior,the content recommendation systems can analyze the user interests,and recommend some specific content objects accordingly[19].It can save a lot of time for users to find their preferred content objects,and improve the user experience[20].One of the most promising technologies for content recommendation is collaborative filtering[21],which captures the similarity between users and items to make recommendations.The impacts of recommendation on the user preferences are usually captured by a statistic model such as in[22].In the recent times,deep learning’s advances have gained significant attention,several recent studies have shown the utility of deep learning in the area of recommendation systems[23],such as the existing works in[24],[25],and[26].

Recently,the study of recommendation systems has been extended into wireless content caching networks[27].By recommending the cached content objects,they can be provided to the users locally without forwarding the requests to the data centers.Therefore,the user experience can be further improved,and the impacts of recommendation are also increased,which can take advantages of both content caching and recommendation[28].In[29],the joint pushing and recommendation policies that maximize the effective throughput in wireless networks have been designed.It is shown that the causal feedback brings significant throughput gain.[30]has introduced an idea of informing the users about what the base station(BS)has cached to make local caching efficient,which can be regarded as a form of recommendation.By applying Q-learning,the proposed policy with recommendation can provide higher long-term system reward than existing policies without recommendation.In[31],an optimization problem for the joint caching and recommendation decisions,aiming to maximize the cache hit ratio,has been formulated and solved by a heuristic algorithm,which has provided bounds for the achieved cache hit ratio.

Although the existing works have verified the necessity of integrating content caching and recommendation,there still exist the following two critical issues to fully explore the potential.First,the existing schemes mainly focus on the extractions of long-term user behavior patterns,and the instantaneous wireless channel statement is always overlooked for content caching and recommendation.Therefore,the existing schemes is not adaptive with the high dynamics of wireless channels,and the performance of content delivery and recommendation cannot be guaranteed.Second,the existing resource management schemes are mainly based on the convex optimization and game theories,which take many iterations during each decision epoch to approach the optimal solutions[32].It is computation-consuming and cannot be employed in the low-latency scenarios.To implement real-time resource control and management,some artificial intelligence-enabled schemes,such as deep neural networks(DNNs),can provide feasible solutions[33].However,it requires a huge amount of data to pre-train the deep learning models,whose convergence and accuracy cannot be guaranteed when the dimensions of variables are high.

Motivated by providing a feasible solution of these problems,the joint design of content delivery and recommendation in wireless caching networks is studied in this paper,and our main contributions can be sum-marized as follows:

First,a joint content delivery and recommendation scheme is proposed.Unlike the existing user preference-centric recommendation schemes,such as[31],a joint management framework of caching,recommendation and communication resource is designed,which provides a comprehensive paradigm of characterizing the utility and cost of content pushing,delivery,and recommendation.

Second,a DRL-based content caching,recommendation,and resource management algorithm is designed.In particular,the caching replacement,delivery,and recommendation are jointly optimized with the radio and backhaul resource allocation to minimize the transmission latency and backhaul loadings.The formulated optimization problem is decoupled into three subproblems and a hybrid DRL and convex optimization paradigm is designed to approach the optimal solution,which can transform the iterative structure of the conventional convex optimization algorithms into a tandem form.Moreover,the decisions of content caching and recommendation can be optimized by two independent DRL paradigms,and the computation cost of model training and test can be reduced.

Finally,the simulation results are provided to evaluate the performance of our proposed algorithm,which can verify its convergence,and show that it can achieve significant performance gains compared with the existing content caching and recommendation schemes.

The remainder of this article is organized as follows:In Section II,the system model is introduced,and a joint content delivery and recommendation scheme with edge caching is introduced and formulated as an optimization problem.Then,a DRL-based algorithm is proposed to approach the optimal solution in Section III.The simulation results are provided in Section IV,and finally Section V concludes this paper.

II.SYSTEM MODEL

Consider a content delivery scenario in wireless content caching networks,where multiple users U1,...,UMare served by a BS B.All the users and the BS are equipped with a single antenna.All users connect with B via wireless channels,and B connects with a data center D via wired backhaul.Moreover,an orthogonal wireless channel is occupied by each user,and the bandwidth of each wireless channel is identical.In this paper,we assume that the number of all the available content objects is limited,which can be denoted as a set,i.e.,C={C1,...,CN}.All the content objects belonging to C are stored by the data center D.Then the strategies of content caching,delivery and recommendation can be introduced as follows,respectively.

2.1 Content Caching Strategy

To balance the backhaul loading and improve the user experience,B equips with an edge cache E,and the cached content objects in E can be changed dynamically,which can be implemented by cache replacement.In particular,the required content objects for caching replacement are transmitted to B via the wired backhaul link,and the backhaul loading and the transmission latency caused by cache replacement can be expressed as follows,respectively:

where Bndenotes the size of content object Cnin bits,snis a parameter that indicates the current edge caching status of Cn,i.e.,sn=1 when Cnis kept by the edge cache E,otherwise it should be set as sn=0,rndenotes the allocated data rate of backhaul for the transmission of Cn,the replacement strategy of Cnis indicated by a Boolean variable xn,i.e.,it should be set as xn=1 if Cnis required to be cached by E after caching replacement,while xn=0 if Cnis not needed.

2.2 Content Delivery Strategy

In this paper,we assume that each user requires only one content object,which can be delivered via the wireless channels.Without loss of generality,we focus on a typical user Um,which requires a specific content object Cn.Based on the caching status of Cn,the content delivery strategy is twofold,which can be introduced as follows.

2.2.1 When CnIs Cached by E

In this case,the required content object Cncan be transmitted to Umwithout interacting with the data center D,and thus the backhaul loading caused by content delivery can be avoided.Moreover,the transmission latency is decided by the data rate of the wireless channel between Umand B.In this paper,the flat fading channel model is employed,which is not changed during the transmission process of Cn.Therefore,the transmission latency of content delivery for Umcan be expressed as

where hmcaptures the flat channel fading of the wireless channel between Umand B,w denotes its bandwidth,Pmis the transmit power of delivering Cnto Um,and σ2is the power of noise.

2.2.2 When CnIs Not Cached by E

Unlike the previous case,the request of Cncannot be responded locally at B,and Cncan be obtained by interacting with the data center D via the backhaul.To reduce the backhaul transmission redundancy,Cnis transmitted only once if it is required for caching replacement and content delivery simultaneously.Therefore,the corresponding backhaul loading can be expressed as

where xnfollows the notation given by(1).Moreover,the transmission latency is jointly determined by the backhaul and wireless channels,which can be written as follows:

As shown in 1)and 2),the content delivery strategy is related to the edge caching status.Therefore,the backhaul loading and total transmission latency of content delivery can be expressed as follows,respectively:

where unis a parameter that indicates the current request status of Cn,i.e,it is set as un=1 when Cnis requested by users,otherwise un=0,um,nis defined similarly to indicate the request status of user Umwith respect to the content object Cn.

2.3 Content Recommendation Strategy

By employing content recommendation,the users will be guided to require the content objects cached by E,and thus both the backhaul loading and transmission latency can be reduced.In this part,the content preferences and the content recommendation strategy can be modeled as follows.

2.3.1 Content Preferences of Users

As introduced in[34],L different thematic categories can be established based on the content features of C1,...,CN.To capture the relevance of Cnand each thematic category,an L×1 vector can be defined as follows:

2.3.2 The Impact of Content Recommendation

The user demand of content objects can be manipulated by content recommendation.In particular,the users may show more interests in the recommended content objects,while the interests of the unrecommended content objects are decreased.As shown in[34],when the content object Cnis recommended to Um,its corresponding request probability can be written as

The target of the content recommendation is to mitigate the potential backhaul loading in the future.In particular,the request probability of Cncan be defined as the average request probability of each user,which can be expressed as follows based on(10)and(11):

where zm,nis a Boolean variable that indicates the recommendation status of Cn,i.e.,it should be set as zm,n=1 if Cnis recommended to Um,otherwise zm,n=0.Based on(12),the expected future backhaul loading after the content recommendation can be expressed as

The content recommendation procedure can be accomplished by sending a list of recommended content objects to Um.Therefore,the transmission latency caused by content recommendation can be written as follows:

where D denotes the signaling cost in bits of recommending a specific content object.

2.4 Problem Formulation

In the previous subsection,we have modeled the content caching,delivery and recommendation strategies which take the backhaul loadings and transmission latency as performance metrics.In this paper,we focus on the total cost minimization of content delivery,which can be modeled as a linear combination of the backhaul loadings and the total transmission latency,which can be defined as follows:

where θ1and θ2denote the weights of backhaul loadings and transmission latency,respectively,(15b)denotes the limited storage volume and BEdenotes the storage size of edge cache E,(15c)denotes the length of recommendation list is restricted,(15d)denotes that only the content objects cached by E can be recommended to the users,(15e)denotes the transmit power is restricted and Pmaxdenotes the maximum transmit power of B,(15f)denotes the limitation of backhaul communication capability and R denotes the channel capacity of wired backhaul link.

III.A DEEP REINFORCEMENT LEARNING-BASED OPTIMIZATION ALGORITHM

As shown in(15),the optimization design of caching and recommendation decisions is tightly coupled,which is a typical mixed integer nonlinear programming(MINLP)problem.Its optimal solution cannot be obtained efficiently by using the traditional optimization approaches.To solve this problem,a DRL-based algorithm is proposed in this section,which can approach the optimal solution in real time to adapt the highly dynamic circumstances.

The framework of the DRL-based optimization algorithm is illustrated in Figure 1,which is composed of three modules,named as cache management module,recommendation management module and resource allocation optimization module,respectively.In particular,both the caching and recommendation management problems can be optimized by employing DRL-based paradigms,and the resource allocation can be solved by using convex optimization.The details can be introduced as follows.

3.1 Cache Management Module

In this paper,the management of edge caches,such as xnin(15),can be optimized by employing DRL methods.The key idea of DRL is to establish the relationship between the actions and the rewards,which can be approximated by a DNN.In our studied cache management problem,the paradigm of DRL can be formulated as follows.

3.1.1 Problem Formulation

The edge cache E can act as the agent in our studied DRL problem.Please note that the utility of caching Cnis not related to Ci,in,and thus the caching decision of Cncan be made independently.During the t-th decision epoch,the caching action with respect to Cncan be modeled as

Moreover,the state of caching optimization can be captured by the parameters given by(15),which can be expressed as follows:

where Bn,snand rnfollow the notation given by(1)and qnfollows the notation given by(12).

The objective of cache management is to minimize the cost and latency of content delivery,which can be characterized by the objective function of(15).To guarantee the utility of edge caching,the reward of taking action antcan be expressed as

3.1.2 A DRL-Based Content Caching Management Paradigm

The key idea of DRL is to choose the action that can maximize the Q-value.To evaluate the Q-value dynamically and accurately,ϵ-greedy policy is adopted to select the action,where ϵ is probability that the action is selected randomly,and(1-ϵ)is the probability that the action is selected based on Q-value maximization criterion,i.e.,

where Acdenotes the caching action space.

Figure 1.Framework of DRL-based optimization algorithm.

3.1.3 The Structure of DNN DCfor Cache Management

The performance of DRL is mainly determined by estimation accuracy of Q-value,which can be approximated by employing a DNN DC.In this paper,a fully connected DNN is used,which contains an input layer,K hidden layers and an output layer,and its structure can be introduced as follows.

·The hidden layers of DC:The hidden layers of DCare the middle layers to approximate the mapping relationship between the input states and output decisions,which have the same structure.In particular,each hidden layer consists of P neurons,where P should follow the constraint P≥NS+NAto support nonlinear approximation,where NAdenotes the number of neurons in the output layer.To pass all the information of each layer,fully connected construction is employed.To avoid the gradient vanishing and guarantee the convergence rate,rectified linear unit(ReLU)function is used as the activation function,i.e.,

where yh,kand xh,kdenote the output and input of the k-th neuron of the h-th hidden layer,respectively.

As shown in Algorithm 1,the Q-values of caching actions for each content object are first evaluated by using DC.Then,the caching decision of each content object is generated independently based on(20),and a caching candidate set RCcan be formulated.Finally,to obtain feasible caching management results restricted by(15b),the elements of RCare adjusted based on(21).

3.2 Recommendation Management Module

Similar to cache management,the recommendation decision can be optimized by employing DRL.However,unlike the conventional DRL-based paradigms,where the output can be chosen arbitrarily without any restrict,the recommendation decision results should follow the constraint given by(15c).

The recommendation decision of each user can be generated independently due to its transmission circumstances and personal preference.During the t-th decision epoch,the recommending action of Umcan be defined as

where zm,lindicates the recommendation status of content object Clfor user Um.Moreover,recalling(15),the corresponding state can be modeled as

where tR,mand LF,mfollow the notations given by(14)and(13),respectively and the subscript m denotes that only the user Umis considered.

3.2.1 The Structure of DNN DRfor Content Recommendation

To guarantee that the generated recommendation result follows the constraint given by(15c),the recommendation probability of each content object is firstly provided by DR.Therefore,output layer ΠM+1can be denoted as an L×1 vector,i.e.,y=[y1,...,yL]T,and its l-th element denotes the recommendation probability of Cl,which can be defined as

where xldenotes the l-th input element of ΠM+1.

3.2.2 A Content Recommendation Management Algorithm

In this part,the recommendation lists can be selected by using K-nearest neighbor(KNN)based scheme.In particular,F feasible recommendation decisions are randomly generated by following(15c),which can be denoted as L={L1,...,LF},i.e.,Lfcan be expressed as

Based on(26),the utility of recommending Clto Umcan be characterized by the output of DR.Therefore,it is equivalent to maximize the recommendation probability to improve the Q-value.It means that the candidate Lf,which is most similar to the output of DR,should be selected,and its similarity with y can be expressed as

Based on the KNN criterion, K candidates of L,which are with the largest df,are selected,i.e.,

3.3 Resource Allocation Optimization Module

In this paper,the objective of resource allocation module is to jointly optimize the backhaul data rate and transmit power.Recalling(15),the resource allocation optimization subproblem can be expressed as follows:

where tC,tDand tRare given by(1),(6)and(14),respectively.Since the transmissions of backhaul and wireless channels are independent with each other,rnand Pmin(30)can be optimized individually,and the following two subproblems can be formulated.

3.3.1 Transmit Power Allocation Subproblem

When rnin(30)is fixed as a constant,the transmit power allocation subproblem can be transformed as follows:

Then its Hessian matrix can be expressed as follows:

H1given by(33)is positive semi-definite.Therefore,(31)is a convex problem,and it can be solved straightforwardly by employing Karush-Kuhn-Tucker(KKT)conditions[35].

3.3.2 Backhaul Data Rate Allocation Subproblem

Based on(30),the backhaul data rate allocation problem can be modeled as follows:

Theorem 1.The optimal solution of backhaul data rate subproblem given by(34)can be expressed as

Proof.Similar to the the transmit power allocation subproblem,the Hessian matrix of QBis derived to show its convexity,which can be expressed as follows:

Since H2is a positive semi-definite matrix,the backhaul data rate allocation subproblem given by(34)is convex.To obtain its optimal solution,the corresponding Lagrangian function is derived as follows firstly:

where μ1is the Lagrangian multiplier,μ≥0.Then its KKT conditions can be written as

3.4 The Joint Optimization Algorithm

To adapt the high dynamics of wireless circumstances and user behavior,the DNNs DCand DRof the content caching and recommendation modules should be updated based on collected data stored by D,respectively.In this paper,decision making and network training are designed as two independent algorithms to implement real-time decision making.As shown in Algorithm 4,a batch of experienced data BCis randomly selected from D,and a caching experienced sample TCis constructed by following(16),(17),(18)and(19)based on BC.Similarly,a recommendation experienced sample TRcan be generated by following(23),(24)and(25)based on a batch of experienced data BR.Then,the parameters of DNNs DCand DR,which are denoted as θCand θR,respectively,can be updated by using experience replay approach,which is based on the loss minimization criterion by employing gradient descent method,i.e.,

IV.SIMULATION RESULTS

To evaluate the performance of our proposed algorithm,the simulation results are provided in this section.In particular,the number of content objects is set as N=200,the popularity of content objects is modeled as Zipf’s distribution[36],and the size of each content object follows uniform distribution,i.e.,Bn～Uni(0.5,2)between 0.5 MB and 2 MBs.The user preferences are modeled based on(9),where the number of thematic categories is set as L=5,the recommendation weights of users are wrec=0.3,and the relevance vector unin(7)and interest vector vmin(8)are generated randomly.The maximum number of recommended content objects for each user is T=5 and the signalling and communication cost of recommending a single content object is set as D=2 KBs.The bandwidth of wireless channel between each user and BS is set as w=50MHz,and the maximum transmit power of BS is Pmax=40.The maximum data rate of backhaul link is R=100 Mbps.

As shown in Figure 2,the convergence of our proposed algorithm is first evaluated.The convergence performance is mainly determined by the DNNs DCand DR,which can be evaluated by their loss performance.In particular,the architecture of DCand DRis designed based on Section III.A and B,respectively.Both DCand DRconsist of two hidden layers,and each hidden layer is constructed with 256 neurons.As shown in the figure,the normalized losses of DCand DRkeeps decreasing as the number of training steps increases,and stably approach 0 after training via 1200 steps.

To evaluate the performance of jointly optimizing content caching and recommendation,the comparison results with the conventional content caching scheme are provided by Figure 3 and 4,where the storage size of edge cache E is set as BE=10,20 MBs.In particular,the DRL-based recommendation algorithm in[26]and the conventional DRL-based content caching algorithm in[37]are selected as benchmarks.In Figure 3,the transmission latency is plotted.Since the user preferences of caching contents can be increased by recommendations,the utility of content caching can be improved,and the transmission latency can reduced significantly.In particular,when the number of users is 10,the transmission latency can be reduced by 36% and 41% compared with the DRL-based con-tent caching algorithm with the settings of storage size of E as BE=10 and 20 MBs,respectively.Moreover,the loadings of backhaul can also be reduced by content recommendation,which are evaluated by Figure4.As shown in the figure,compared with the DRL-based content caching algorithm,when the number of users is 10 and BE=10 MBs,the backhaul loadings can be reduced by 33%,and the performance gain can be enlarged to 51% when BE=20 MBs.

Figure 2.Convergence of Algorithm 4.

Figure 3.Transmission latency versus the number of users.

Figure 4.Backhaul loadings versus the number of users.

In Figure 5 and Figure 6,the comparisons with other content caching schemes are provided.In particular,the least frequently used(LFU)content caching with recommendation is selected as a comparable scheme,which keeps the most popular items of all users in the caches.Moreover,the random content caching with recommendation is chosen as another benchmark.The transmission latency performance is provided by Figure 5.As shown in Figure 5,our proposed scheme can achieve better performance than the comparable schemes.Unlike the conventional caching schemes,where the impact of recommendation on user preference is not considered,our proposed scheme can fully explore the utility of edge caching in a cost efficient way by jointly optimizing content caching and recommendation.Therefore,the cost of content caching can be significantly reduced.In particular,compared with LFU and random caching schemes,when the number of users is set as 30,the transmission latency can be reduced by 35% and 51% by employing Algorithm 3 for BE=10 MBs,respectively.Similarly,the loadings of backhaul can also be mitigated by employing Algorithm 3.In particular,compared with LFU and random caching schemes,the loadings of backhaul can be reduced by 42% and 69%,respectively,when the number of users is 30 and BE=10 MBs,and it can enlarged to 48% and 74% when BE=20 MBs.

Figure 5.Transmission latency of different schemes.

Figure 6.Backhaul loadings of different schemes.

V.CONCLUSION

To fully explore the potential of edge caching,the joint design of content caching and recommendation in wireless content caching networks has been studied in this paper.The key idea is to guide the user preference to adapt with the edge caching status by content recommendation.Firstly,a joint content delivery and pushing scheme with recommendation has been designed,and then an optimization problem of content management and recommendation has been formulated to minimize the transmission latency and backhaul loadings.Secondly,to approach the optimal strategies of formulated problem in a computationefficient way,a DRL-based joint optimization algorithm has been proposed,which can adapt with the high dynamics of wireless circumstances.Finally,the simulation results are provided to evaluate the convergence and performance of our proposed algorithm.Compared with the conventional caching management schemes,our proposed DRL-based caching and recommendation scheme can significantly reduce the transmission latency and backhaul loadings.

ACKNOWLEDGEMENT

This paper is supported by Beijing Natural Science Foundation(Grant L182039),and National Natural Science Foundation of China(Grant 61971061).

China Communications2021年11期

China Communications的其它文章: A Key Management Scheme Using(p,q)-Lucas Polynomials in Wireless Sensor Network; Pilot Allocation Optimization Using Enhanced Salp Swarm Algorithm for Sparse Channel Estimation; Put Others Before Itself:A Multi-Leader One-Follower Anti-Jamming Stackelberg Game Against Tracking Jammer; DCGAN Based Spectrum Sensing Data Enhancement for Behavior Recognition in Self-Organized Communication Network; On Treatment Patterns for Modeling Medical Treatment Processes in Clinical Practice Guidelines; Hybrid Recommendation Based on Graph Embedding