Clustering and Resource Allocation Strategy for D2D Multicast Networks with Machine Learning Approaches

2021-01-20 11:08:08FanJiangLanZhangChangyinSunZengYuan

China Communications 2021年1期

Fan Jiang,Lan Zhang,*,Changyin Sun,Zeng Yuan

1 Xi'an University of Posts and Telecommunications,Xi'an 710121,China

2 Shaanxi Key Laboratory of Telecommunications and Information Networks and Security,Xi'an 710121,China

Abstract:In this paper,the clustering and resource allocation problem in device-to-device (D2D) multicast transmission underlay cellular networks are investigated.For the sake of classifying D2D users into different D2D multicast clusters,a hybrid intelligent clustering strategy(HICS)based on unsupervised machine learning is proposed first.By maximizing the total energy efficiency of D2D multicast clusters,a joint resource allocation scheme is then presented.More specifically,the energy efficiency optimization problem is constructed under the quality of service (QoS)constraints.Since the joint optimization problem is non-convex,we transform the original problem into a mixed-integer programming problem according to the Dinkelbach algorithm.Furthermore,to avoid the high computational complexity inherent in the traditional resource allocation problem,a Q-Learning based joint resource allocation and power control algorithm is proposed.Numerical results reveal that the proposed algorithm achieves better energy efficiency in terms of throughput per energy consumption.

Keywords:device-to-device multicast communication;clustering;energy efficiency;resource allocation;Q-Learning

I.INTRODUCTION

To cope with the continuously increasing wireless traffic and meet the stringent quality of service (QoS)requirement of emerging wireless services,the fifthgeneration (5G) networks are designed and expected to support a great number of wireless connected devices in various usage scenarios [1].Among all possible scenarios of 5G,applications such as live broadcast,video on demand,vehicle to vehicle communication and human-machine interaction are the most challenging 5G application cases.Meanwhile,recent exponentially growing amounts of smart mobile devices and multimedia services are accelerating the growth of video traffic on the Internet.Especially,the continuously increasing demand for data services of the geographically proximate user has brought great challenges to current mobile communication infrastructure.To deal with such challenges,one promising solution is device-to-device(D2D)based communications that might satisfy the strict requirements of the manifold usage scenarios.By allowing direct communication among proximate users' equipment,D2D multicast transmission is capable of enhancing spectrum efficiency,extending cell coverage,and improving energy efficiency[2,3].

As one of the application scenarios of D2D communication,D2D multicast transmission provides a prominent solution to offload data traffic from base stations(BS),which not only increases the spectral efficiency of the networks,but also improves the system capacity.D2D multicast transmission,which is widely adopted in scenarios such as live Internet TV,massive file sharing,online Internet video streaming and downloads,is competent in providing multimedia broadcast services[4]while alleviating the transmission pressure of current backbone networks.Recently,many literatures suggest adopting D2D multicast transmission as a practical solution for content offloading or content delivery for the 5G wireless networks,which makes full use of the potential computing and storage capabilities of edge users.However,due to serious interference among different users,D2D communication might severely degrade the QoS of the cellular networks without proper resource management techniques.Therefore,how to optimally allocate radio resource and power becomes a crucial issue.

Recent researches concerning D2D multicast transmission mainly focus on the following aspects:how to form D2D multicast clusters,D2D mode selection[5]and radio resource and power allocation methods for D2D multicast transmission.Concerning with recent researches on D2D multicast clustering methods,Ren F et al.[6]proposed a clustering scheme for D2D multicast clusters from the perspective of physical distance and social relationships.Mohammadet al.[7]proposed a self-organizing D2D multicast clustering method based on individual user channel gain.Zhang et al.[8]discussed D2D multicast clusters formation and resource allocation problem in D2D networks which utilizes a half-duplex scheme and a full-duplex scheme to coordinate the channel sharing between the cellular links and the D2D links.However,one important factor that is often neglected in the aforementioned researches is the user preference.User preference as a common indicator adopted in content caching and content delivery process actually reflect user's on certain contents.Recently,more and more methods tend to adopt machine learning techniques to explore the clustering issues in D2D multicast transmission.Uyoata et al.[9]proposed a K-means algorithm based D2D multicast clustering method.Farhana et al.[10]presented a selforganizing feature map (SOM) clustering algorithm according to user preference and content popularity to determine the user's learning feature group.Komal S et al.[11]proposed a hierarchical clustering algorithm that considers user preference and the similarity of user requests.However,in the above clustering algorithms,the real transmission distance between cluster head(CH)and cluster members is often omitted,thus the reliable communication between CH and cluster members cannot be guaranteed.

On the other hand,concerning recent literatures on resource allocation and power control methods.Juhyun et al.[12]proposed a spectrum and power allocation scheme to maximize the total average achievable rate.Wang et al.[13]proposed a joint power and channel allocation algorithm for D2D underlay cellular system to maximize the sum rate of cellular users(CUs).D Feng et al.[14]proposed an efficient resource allocation scheme to maximize the energy efficiency of D2D multicast communication.And this scheme can also maximize the energy efficiency of the cellular and D2D system.Fan et al.[15]studied the cellular and D2D resource allocation problems under the SV-MAC protocol to maximize the energy efficiency of the cellular and D2D system.Lee et al.[16]proposed a spectrum and power allocation scheme to maximize the total average achievable rate under the outage probability constraint.Besides,as one of the most promising artificial intelligence tools,machine learning has the potential to assist the radio resource allocation problem in terms of intelligent adaptive learning and decision making,so that the diverse requirements of next-generation wireless networks can be satisfied [17].Due to the nature of machine learning which interacts with an unknown environment by exploration and exploitation,the learning based algorithms are suitable to solve optimization problems in practical wireless communication scenarios.Among current studies,the Q-Learning method,which is a typical reinforcement learning (RL) based strategy,is the research focus[18-21].Zhao et al.[22]proposed a multi-agent RL algorithm to achieve power control for D2D users.The goal of the proposed algorithm is to maximize the system capacity while protecting the QoS for CUs.Nie et al.[23]proposed two RL based power control algorithms in underlay D2D communication,team Q-Learning and distributed QLearning.However,most of the above works mainly focus on maximizing the system capacity while only a few works consider the energy efficiency of D2D multicast transmission.

Motivated by previous contributions,this paper mainly focuses on D2D multicast clusters forming and radio and power resource allocation issues within D2D multicast communications underlying cellular networks.The aim of this work is to maximize the overall energy efficiency of D2D multicast clusters while guaranteeing the QoS for CUs.Firstly,we propose a hybrid intelligent clustering strategy to divide D2D users into several D2D multicast clusters.Secondly,a Q-Learning based joint resource allocation and power control algorithm is proposed to achieve the optimal energy efficiency of whole D2D multicast transmission.The contributions of this paper are summarized as follows:

(1)A hybrid intelligent clustering strategy (HICS):Under the content delivery scenario,a clustering strategy that combines the user preference and reliable communication distance between each D2D user is proposed.Specifically,we first put forward a selforganizing feature map (SOM) based categorized algorithm to divide D2D users into different categories;Then,in order to ensure reliable communication in each D2D multicast categories,an agglomerative hierarchical clustering algorithm(AHCA)based clustering algorithm is proposed to further adjust user distributions in each cluster.

(2)A Q-Learning based joint resource allocation and control algorithm:A joint resource and power allocation algorithm for D2D multicast CHs and CUs is presented.Specifically,to maximize the overall energy efficiency of D2D multicast clusters,we first formulate the energy efficient resource allocation problem with QoS constraints of CUs.However,the built energy efficient resource allocation optimization problem is a non-convex optimization problem.According to the Dinkelbach algorithm,we transform the original optimization problem into a mixed integer programming problem,which can be proved as a convex optimization problem.Combined with machine learning techniques,we further propose a Q-Learning based joint resource allocation and power control algorithm,which finds the globally optimal power value and resource allocation result in an iterative way.

The rest of the paper is organized as follows.In Section II,we formulate the system model.Section III,a hybrid intelligent clustering strategy is proposed.Section IV introduces a Q-Learning based joint resource allocation and power control algorithm.Simulation results and analysis are given in Section V.Finally,we conclude the paper in Section VI.

II.SYSTEM MODEL

As demonstrated in Figure1,we consider an uplink hybrid cell transmission scenario where D2D links share uplink resource blocks(RBs)with multiple CUs.Assuming that all users are evenly distributed in the cell,the BS can locate the users and the users are in a low-speed moving state or in an indoor environment.In the considered model,suppose there areNuplink CUs which occupyNorthogonal RBs.To clarify representation,the resource allocation scheme for CUs is assumed to be pre-determined by the BS(e.g.thei-th RB is allocated to thei-th CU).We usen,n ∈N={1,2,···,N}to indicate then-th CU,where N represents the set of CUs.There are K D2D users,which can be classified intoMD2D multicast clusters.Letm,m ∈M={1,2,···,M}to indicatem-th D2D cluster,where M represents the set of D2D multicast clusters.For each D2D multicast cluster,the CH serves as the D2D transmitter and the cluster members act as D2D receivers.It is worth to mention that during the multicast transmission process,each CH first downloads the preferred files from the BS in advance according to cluster members preferences.In the second stage,CH multicasts the prestored files to all cluster members.In order to achieve energy efficient D2D multicast transmission,how to form D2D multicast clusters and optimally allocate radio resource and power become crucial issues.

Figure1.Network model of D2D multicast transmission.

The architecture of the considered D2D multicast transmission underlaying cellular network is shown in Figure1.As shown in Figure1,there are several D2D multicast clusters coexist with several CUs.In the Figure,we assume thatCH1reuses the uplink cellular resource ofCU1andCH2reuses the uplink cellular resource ofCU2,respectively.We assume that each D2D cluster can reuse at most one CU's RB resource and the uplink RBs of each CU can be shared by at most one D2D cluster.

In the D2D multicast transmission process,D2D multicast cluster members not only receive the signal from CH but also receive the inference signal from corresponding co-channel CU.On the other hand,the multicast transmission in each cluster will also deteriorate the transmission quality of CU owing to channel reuse.Based on the hybrid transmission scenario built in Figure1 and in order to solve the problem of cluster formation,a hybrid intelligent clustering strategy that combines user preference together with physical transmission distance is initially proposed so as to achieve efficient content delivery.Furthermore,in order to deal with the co-channel interference between each cluster and its co-channel CU,we further introduce a Q-Learning based joint resource allocation and power control algorithm.Before the cluster formation process,we first introduce the following concepts.

Generally,user preference is modeled as the conditional probability distribution of a user's request for every file when the user sends a file request.In this paper,user's rating of different files is quantified so as to obtain exact user preference.Suppose the total file types isAand there areKD2D users.We build an evaluation matrixζ=[ζ1,ζ2,...,ζi,...,ζK],i ∈[1,K]to record all D2D user preferences,where user evalation setζi=[ζi,1,ζi,2,...ζi,k,...ζi,A]T,k ∈[1,A]indicates thei-th D2D user preference for each file.The evaluation factorζi,kis obtained bywheresi(k) represents the respective rating thati-th user give tok-th file type,where the value ofsi(k)may be obtained through GroupLens.

Based on the above related concept,we present our proposed clustering algorithm in next section.

III.A HYBRID INTELLIGENT CLUSTERING STRATEGY

Since different users have different preferences for different file types,in order to achieve efficient distribution of files,the SOM-based categorizing algorithm is first applied to divide D2D users into several categories according to file preference similarity among D2D multicast users.However,the above clustering process only considers user's preference for different file types while the real communication distance is neglected,which might lead to unreachable communication between CH and category members.In order to ensure reliable communication of D2D multicast,an AHCA-based clustering algorithm is further proposed which considers physical transmission distance to form D2D multicast clusters.

3.1 SOM-based Categorizing Algorithm

Since user preference varies with different users,by adopting proper category method,content hit rate can be effectively increased during the content delivery process,so as to achieve efficient distribution and to avoid redundant caching for D2D users.In this subsection,SOM-based categorizing algorithm is utilized to achieve an effective category according to file preference similarity among D2D users.The SOM algorithm is an unsupervised machine learning clustering technique proposed for high dimensional information.The objective of SOM algorithm is to divide several samples into several categories according input data.In the proposed SOM-based content delivery process,D2D users are initially divided into different categories according to input user preference,where D2D multicast users in each category are assumed to have the similar preference for the same file.

The SOM algorithm simulates a Feed-forward neural network with self-organizing characteristics in the brain.Its network topology consists of two layers:the input layer and the output layer.Both input layer and output layer consist of neurons that are fully interconnected through the connection weights.We assume that there areAinput neurons andLoutput neurons.Furthermore,the connection weight matrix is represented by W=[wi,j]A×L,i ∈[1,A],j ∈[1,L],wherewi,jrepresents the connection weight betweeni-th input neurons andj-th output neurons.In the proposed SOM-based categorizing algorithm,the input neurons represent the total user preference concerning withAfiles,whereLoutput neurons denote the total number of categories after clustering process.The proposed SOM-based category process involves with two main procedures which are initialization period and training period.

In the initialization period,the connection weight matrix is assigned with a random value.In the training process,competitive learning based method is utilized.More specifically,take thei-th D2D user as an example,the detailed process of the SOM-based categorizing algorithm comprises the following 5 steps:

Step 1:In SOM network topology,the initial value of the connection weight matrix W between input neurons and output neurons is obtained using the random-based method;In the input layer,i-th D2D users together with its user evaluation setζiis substituted into the input neurons as the input data.

Step 2:Calculate the Euclidean distance between the input dataζiand the weight matrix W,where the distance is expressed as Eq.(1).The output neuron related with the minimum distance is defined as the winning neuronNumwin,whereNumwin=arg

Step 3:Select the neighborhood domain of the winner neuron according to the following criteria.AssumingHi,jrepresents the distance betweeni-th output neuron andj-th output neuron in the output layer,the neighborhood domainTi,Numwin(t)is selected according to the following expression Eq.(2) and decreases over time.

whereσ(t) is an exponential function that decreases over time and it is calculated byσ(t)=σ0exp(-t/Tσ).

Step 4:Update the weight matrix W according to the following expressionwi,j=wi,j+Δwi,j,whereφ(t) denotes the learning rate that decrease to 0 over time,and Δwi,j=φ(t)·Ti,Numwin·(ζi,j -wi,j).

Step 5:Repeat step 2 to 4 untilφ(t)decreases to 0 or the total repeat times reaches the maximum number of iterations.

After carrying out above procedures,it is assumed that the category results reaches stability which means thei-th D2D user is classified into certain categories.

By executing the above SOM-based categorizing algorithm for all D2D users,D2D multicast users with similar user preference is successfully divided into several categories.However,one of the drawbacks of the above SOM-based categorizing algorithm is that it only considers user preference during the clustering process while the physical distance between users is omitted.It is highly possible that some users within the category might be interested in certain content but could not reach the cluster owing to the real transmission range limit.To deal with the above issue,we further propose an AHCA-based clustering algorithm by taking the physical transmission distance into account so as to adjust user distribution in each category.

3.2 AHCA-based Clustering Algorithm

AHCA algorithm,which is a unsupervised machine learning technology,adopts the idea of ”bottom-up”to form clusters.Based on the categorizing results obtained from the above mentioned SOM-based algorithm,D2D users in each D2D multicast categories is further regarded as a separate clusters.We define a distance threshold Γthwhich indicates the furthest transmission distance to guarantee the reliable communication between CH and cluster members after the formation of the new cluster.By merging the nearest pair of clusters repeatedly until the distance between the two furthest clusters in all clusters exceeds the distance threshold,the clustering process is regarded as completed.

Assume that vectorsai=(ai,1,ai,2),aj=(aj,1,aj,2) represent the position coordinates ofi-th user andj-th user,respectively.We assume that there areZusers in the category and further build a distance matrix which record the distance between two D2D users as D=[di,j]1×1,wheredi,jrepresents the Euclidean distance between two D2D multicast users and is calculated by

Take them-th user category (m ∈M) as an exampleand assume that there are Z users in the category,based on AHCA clustering algorithm,the specific clustering steps are as follows.

Step 1:Input the distance matrix D which includes the Euclidean distance between two D2D users within them-th category;

Step 2:Treat each user inm-th category as a separate cluster;

Step 3:Calculate the distance between two closest clusters inm-th category.Merge two closest user clusters if the distance do not exceeds the threshold distance;

Step 4:Find the further distance between two neighbor clustersC1,C2,where it is assumed that the number of users in clusterC1,C2are represented as|C1|,|C2|,respectively.Updating the distance matrix of all user clusters according to Eq.(4).

Step 5:Repeat Step 3 to 4.The above clustering process stops until the maximum distance between the two furthest users in all clusters exceeds the threshold distance.

Specifically,when using the AHCA-based clustering algorithm D2D users are categorized into different clusters.However,there might exists some D2D users who's the Euclidean distance from all the CHs exceeds the distance threshold.Hence,in order to guarantee reliable communication,those D2D users are assumed to establish D2D unicast communication with the proximal D2D cluster member so as to obtain the required files.

After carrying out above procedures,the clustering process is completed.With the proposed AHCAbased clustering algorithm which combines the user preference and reliable communication distance,D2D users are partitioned into different clusters.Based on the clustering results,in order to maximize the total system energy efficiency of D2D multicast clusters and to deal with reuse interference between CUs and CHs,a Q-Learning based joint resource allocation and power control algorithm for D2D multicast transmission is proposed.

IV.A Q-LEARNING BASED JOINT RESOURCE ALLOCATION AND POWER CONTROL ALGORITHM

4.1 Problem Formulation

Based on the proposed clustering algorithm in the previous section,we assume that D2D multicast clusters have been already formed.Consequently,in this section,we mainly focus on resource and power allocation scheme in each D2D multicast clusters which underlying cellular networks.After clustering,letDmindicate the set of receivers in them-th D2D multicast cluster,where|Dm|denotes the total number of receivers in them-th D2D cluster.Let binary variablesym,i ∈{0,1}indicates whether thei-th RB is reused bym-th D2D multicast cluster.Assume that each D2D cluster can reuse at most one CU's resource and the uplink RBs of each CU can be shared by at most one D2D cluster,then it can be derived that∀m.Generally,it is commonly believed that the transmission rate of a D2D multicast cluster is determined by the user with the worst channel condition.Lethi=d−αandgm,i=d−αdenote the normalized channel gain fromi-th CU to BS and fromm-th D2D multicast cluster head to the corresponding worst receiver inm-th D2D cluster,respectively.The channel fading follows the quasi-static Rayleigh distribution and the variance of the channel gain isd−α,wheredis the distance in meters from the transmitter to the receiver andαis the path loss index.Let the path loss indexαbe 3.5.Assume them-th D2D multicast cluster head is reusing the resource ofi-th CU,then there will be inevitable reuse interference betweenm-th D2D multicast cluster andi-th CU.In order to represent the reuse interference,denotehIi,mandgIm,ias the normalized channel gain from thei-th CU to the worst receivers among them-th D2D cluster which reusei-th CU's RB,and from the transmitter ofm-th D2D cluster to thei-th CU,respectively.Then,the data rate of them-th D2D multicast cluster transmitting on thei-th RB can be expressed as

wherepm,iandpidenote the transmit power of themth D2D multicast cluster on thei-th RB and the transmit power of thei-th CU,respectively.Since D2D multicast transmission works as a supplement to the cellular networks,the QoS of CUs should be guaranteed.The data rate of thei-th CU can be written as

whereRcdenotes the minimal data rate requirements of CUs.Eq.(6) is interpreted as in order to ensure reliable cellular communication,the transmit rate of a cellular user should be larger than the minimal data rate requirements.

Then,our target is to maximize the total energy efficiency of all D2D multicast clusters considering the QoS constraints for CUs.The energy efficiency of all the D2D multicast clusters is defined as the ratio of total D2D multicast transmission data rates to the overall consumed power of all clusters,and the energy efficiency optimization problem is built as

wherepmcdenotes the consumption of circuit power atm-th multicast cluster.Constraint 7a ensures the QoS requirement of cellular transmission.Constraint 7b is explained as the uplink RBs of each CU can be shared by at most by one D2D cluster.Constraint 7c and 7d ensure that the transmit power of each D2D cluster should notexceed the maximum transmit powerand the transmit power of CUs does not exceed the maximum transmit powerrespectively.

More specially,the goal of this paper is to maximize the overall energy efficiency of D2D multicast system.Therefore,these D2D unicast users are no longer considered in the formulated optimization equation built in Eq.(7).According to Eq.(7),in order to improve the overall energy-efficiency,we have to find the optimal resource reuse relationshipym,ibetween D2D clusters and CUs,as well as to determine the optimal transmit power of both CUs and D2D clusters which also guarantees minimum SINR requirement.However,when D2D multicast transmission underlying with a cellular network,the resource allocation and power allocation strategy are actually interacted with each other.And the existence of integer assignment variableym,imakes the optimization problem more complicated.

Consequently,the above optimization problem in Eq.(7) is a non-convex optimization problem which is a NP-Hard problem and there are no efficient solutions.Moreover,when the problem size increases,the computational complexity also increases exponentially.In order to solve above problem,the original optimization problem is transformed into a mixed integer programming problem according to the Dinkelbach algorithm in the next section.Then,we propose a joint resource allocation and power control algorithm,so as to strike a balance between the performance and complexity.

4.2 Problem Equivalence

In order to obtain the optimal solution to the problem built in Eq.(7),we first transform the original problem Eq.(7)into an equivalent form by describing the transmit power of CUs as a function of the transmit power of D2D clusters.

Assume thati-th CU's RB is reused bym-th D2D multicast cluster,which is expressed asym,i=1.From formula Eq.(6),it can be inferred thatpi ≥From Eq.(7) it can also been observed that for a givenpm,i,the optimization objective which is expressed asηe∗edecreases with the increase ofpi.Therefore,in order to obtainshould be set to its minimum value,wherepiis calculated as

Consequently,the optimization equation Eq.(9)and constraint 9c can be obtained by substitutingp∗ithe objective function Eq.(7)and constraint 7d.Then,the original problem Eq.(7)can be reformulated as

wheredm,i=higm,i,em,i=

After simplification,we can observe that the variables related with the original optimization problem in Eq.(7)are further reduced to{ym,i,pm,i},∀m,iin the equivalent problem built in Eq.(9).Again,assumeη∗eedenotes the maximum energy efficiency of all D2D multicast clusters,the optimization objective function can be rewritten as

According to nonlinear fractional programming theory,problem Eq.(10)can be transformed into a corresponding subtractive form as follows

where

With corresponding the maximal value ofη∗ee,the equivalence of Eq.(10)and Eq.(11)can be easily verified at the point（where the maximal energy efficiency valueη∗eecan be obtained by using the iterative algorithm according to Dinklbach algorithm[24].In addition,it is not difficult to verify that bothandare concave when related topm,i.Then,the optimization problem Eq.(11)can be expressed as

s.t.9a,9b,and 9c.

Due to the existence of integer assignment variableym,i,we can find that problem Eq.(13) is still a intractable non-convex problem.In order to achieve a tractable optimization form,we relaxym,ito interval[0,1]temporarily and also introduce an auxiliary variable which is expressed assm,i=ym,ipm,ito replacepm,i.And then problem Eq.(13) can be transformed into the following format

where

According to [25],it can be verified thatBD2D(sm,i,ym,i) is a perspective function ofPD2D(sm,i).Therefore,BD2D(sm,i,ym,i) is jointly concave when related to parameterssm,i,ym,isincePD2D(sm,i)is concave.According to the composition rule in [25],we can find that the objective function Eq.(14) is also convex.In addition,it can be proved that all constraints are jointly convex when related toBD2D(sm,i,ym,i).

4.3 A Q-Learning Based Resource Allocation Algorithm

Based on the above analysis,it can be verified that the newly built problem Eq.(14) is transformed into a mixed integer programming problem,which can be proved as a convex optimization problem.Consequently,the problem can be solved by employing Karush-Kuhn-Tucker(KKT)conditions.According to the analysis of optimal D2D-CU matching relationship and corresponding power control scheme,we present the following proposition.

Proposition:Suppose that the RB of thei-th CU is shared by them-th D2D cluster,then the optimal transmit power of D2D cluster m can be obtained by the following equation.

Proof.According to the optimization problem built in Eq.(14),constraints 14b and 14c,we can construct the Lagrange function as

By taking the partial derivative ofpm,i,it can be derived that

whereγ1=(2em,ifm,i+em,idm,i)2,γ2=

The power of D2D cluster m can be calculated as:

whereµ,λrepresent the Lagrange Multipliers,and according to constraints 14b and 14c,it can be derived thatµ≥0,λ ≥0,respectively.

In the above power allocation process,we find the optimal transmit power of D2D multicast clusters which is based on the assumption that the resource pairing relationship between CUs and D2D multicast clusters has already been determined.Actually,by considering the minimum rate requirement of CUs,such resource pairing relationship might not be the optimal solution.To maximize the energy efficiency of D2D multicast communication we need to search for the optimal resource pairing relationship between CUs and D2D multicast clusters.Although many resource allocation methods have already been proposed in previous works,the exhausting searching principle is widely adopted in the previous works,which is characterized with high computational complexity.Furthermore,the resource allocation process cannot be adjusted in an online way with the variation of mutual interference.Therefore,to further decrease the cochannel interference and to adjust the resource pairing relationship between D2D multicast clusters and CUs dynamically,we want to obtain the optimal radio resource allocation scheme with the aid of Q-Learning.

Before Q-learning resource allocation process,we introduce the following variables.Firstly,a reuse indication matrixφ=[φi,j]M×Nis built which records the reuse indication between thei-th D2D multicast cluster and aj-th CU wheni-th D2D multicast cluster reuses the same uplink RBs ofj-th CU whereφi,jcan be calculate as

From Eq.(16),it can be derived that under the condition thati-th D2D multicast cluster reuses the same uplink RBs ofj-th CU,if the data rate ofi-th CU is greater than that of the minimum data rate requirement,thenφi,j=1.Otherwise,φi,j=0.After doing the above procedure for all the D2D clusters,it can be observed that for the reuse indication matrixφ,there might exists multiple D2D multicast clusters who can share the same uplink RBs of CU without hampering the minimum rate requirement of the CU.This phenomenon is further explained as the matching relationship between D2D multicast clusters and CUs is not uniquely determined.

Since D2D multicast data transmission rate is determined by the cluster member with the worst channel conditions,a channel state indication matrix G=[gi],i ∈[1,M]is built,wheregiindicates the channel gain betweeni-th CH and the corresponding cluster member with the worst channel state.Combined with the reuse indication matrixφ,channel state indication matrix G,and to find the optimal resource reuse matrix between D2D multicast clusters and CUs,we propose a Q-learning based resource allocation algorithm.Here,the optimal resource reuse matrix is represented by Y=[yi,j]M×N,i ∈[1,M],j ∈[1,N],where Binary variablesyi,j=1 indicates thej-th RB is reused byi-th D2D multicast cluster,otherwiseyi,j=0.

As an important type of machine learning,QLearning algorithm has been proved as an effective solution for resource allocation for its agent can learn how to better behave taking actions to interact with the environment according to observing state and obtaining rewards.The structure of the proposed Q-Learning based resource allocation algorithm is shown in Figure2,where the BS acts as an agent who interacts with the environment constitute by D2D multicast clusters and CUs.During the interaction process,the resource reuse relationship between a certain CH and CU is modeled as the action,and the corresponding data rate is modeled as the reward value.The Q-Tableis defined as matrix Q=[Qi,j]M×N,i ∈[1,M],j ∈[1,N],whose rows represent the current state and columns represent the action.AndQi,jis updated according to following expression

whereri,jrepresents the data rate of thei-th D2D multicast cluster transmitting on thej-th RB can be expressed as which can be obtained by Eq.(5).Specially,since users are assumed to located in an indoor environment or are moving in a low-speed.Hence,the conditions between users as well as between users and BS will not change dynamically.Furthermore,once the resource reuse relationship between D2D multicast clusters and CUs is determined,the data rate of them-th D2D multicast cluster withi-th RB is considered as a constant.Hence,the reward becomes stable at the end of the training process in Q-Learning.γindicates the discount factor which is the effect of future rewards to the current decisions,and it follows thatγ ∈[0,1];i,jrepresent the current status and current action,respectively.In the Q-Learning algorithm,by continuously updating the Q-value until the maximum number of repetitions is reached,the Q-Tablecorresponding to Figure2 is regarded as been constructed.

Figure2.Q-Learning based resource allocation.

After obtaining Q-Table,we combine with the obtained reuse indication matrixφ,channel state indication matrix G together to obtain the optimal resource reuse matrix Y.The detailed procedure of the proposed Q-Learning based resource allocation algorithm is derived in Algorithm 1.By carrying out Algorithm 1,the optimal resource reuse scheme can be obtained.

Moreover,we further proposed the Q-Learning joint resource allocation and power control algorithm for D2D multicast transmission so as to maximize the optimal energy efficiency of whole D2D multicast transmission.The detailed pseudo code of the proposed scheme is given in Algorithm 2.

In the first stage of the proposed Q-Learning based resource allocation algorithm,the training process of Q-Learning will be solvedM ×Ntimes in the worst case.Then in the second stage,the goal of resource reuse relationship is to find the best CU for every D2D multicast cluster CH under the premise of the one to one resource reuse constraints.Hence,the complexity of the proposed Q-Learning based resource allocation isO(M×N)+O(M).It can be seen that,compared with conventional exhaustive searching based method[14](where the complexity isO(M3)+O(M×N)),our proposed resource allocation algorithm can considerably reduce the computational complexity.

From the viewpoint of the global optimization,QLearning is a heuristic method to get the global optimal radio resource user relationship between D2D multicast cluster CHs and CUs.Therefore,the proposed Q-Learning based joint resource allocation and power control algorithm can get the globally optimal solution for the joint resource allocation and power control problems in an iterative way.

V.SIMULATION RESULTS

In this section,we evaluate the performance of the proposed clustering and energy-aware resource allocation algorithm.We consider the scenario such as movie seminar where hundreds of people are gathered in the small region who are eager to require certain movie files.Under such situations,user preference based D2D multicast transmission can be exploited to relieve excessive load burden for the BS.The simulate network consists of a cellular area with a radius of 200 meters,where all the users are randomly distributed in this area according to uniformly distributed.The distance based path loss and shadowing fading are considered for transmission channel.For the resource allocation and sharing method,we still take sharing the uplink cellular resources as an example.Assume that all the available resource is divided into RBs,and each accessed UE is allocated with one RB at each scheduling slot.In the D2D multicast clustering simulation,user preference are calculated based on data collected from MovieLens data set provide by GoupsLens,which is a movie rating set of users for different movies.Other related simulation parameters are listed and shown in Table1.

Table1.Simulation parameters

Noted that,in the following simulation results,compared with two other clustering schemes,we first evaluate the performance of the proposed hybrid intelligent clustering strategy.Then,based on the D2D multicast cluster formation results,we compare the performance of our proposed energy-aware resource allocation strategy with three other resource allocation schemes.The first one is Q-Learning based power allocation algorithm proposed in[23].The second one is the energy efficiency scheme[26],which aims at maximizing the total energy efficiency of the system.The third algorithm [7]is based on a self-organized D2D clustering algorithm.

Figure3.The results of two algorithms for D2D multicast cluster formation.

Figure3 shows the clustering results by using Kmeans based algorithm[9]and our proposed hybrid intelligent clustering strategy,where there are 300 users randomly distributed in a cellular area with a radius of 200 meters.From Figure3,it can be observed that ten multicast clusters have been formed by using the K-means algorithm and the proposed algorithm,respectively.Furthermore,compared with the K-means based schemes,some of the D2D multicast clusters in our proposed scheme comprises of more cluster members.This phenomenon is explained as our proposed algorithm combines the user preference with the physical communication distance,which enables D2D multicast users to be evenly distributed in each cluster.On the other hand,important parameters such as the position of initial CH and the number of total clusters in the K-means algorithm,will greatly affect the clustering result.Hence,if unfortunately,CH happens to locate in the area where users are sparsely distributed,the limited number of users who can carry out reliable communication within CH restricts the cluster size.

Figure4 depicts the comparison of the total number of D2D multicast users using different algorithms with the increasing numbers of users.In Figure4,it can be seen that the total number of D2D multicast users increases as the number of users increases.The performance of our proposed clustering algorithm is between the other two schemes in terms of the total number of D2D multicast users.This is because the proposed algorithm not only considers user preference of users but also guarantees the reliable communication of D2D multicast.After dividing the users into different categories with SOM,the users in each category is merged into different D2D multicast clusters with the proposed AHCA-based clustering algorithm,where some D2D users are removed from the original category.For the algorithm proposed in[10]which is based on SOM,user preference is chosen as the unique standard when dividing users into different clusters.Hence,reliable communication between the CH and cluster numbers might not be guaranteed.Furthermore,it can be observed that the algorithm presented in[9]performs the worst compared with the other two algorithms.The reason is two fold.Firstly,the Kmeans based clustering algorithm [9]only considers reliable communication of D2D multicast where user preference is omitted.Secondly,the clustering result is sensitive to the position of initial CH and the number of total clusters in the K-means algorithm,where the clustering result might not be optimal.

Figure4.The total number of D2D multicast users versus the total number of users.

Figure5 describes the total energy efficiency of the D2D multicast cluster with the variation of the D2D cluster radius.It can be observed that the energy efficiency performance of our proposed scheme and the other three scheme decreases with the increase of cluster radius.This is because the channel gain of the D2D multicast link will decrease with the increase of the D2D cluster radius.Hence,larger transmitting power is required for the D2D clusters to satisfy the minimum transmission rate.It can also be seen that the energy efficiency performance of our proposed algorithm and algorithm [23]is better than other referenced algorithms [26]with the increase of the D2D cluster radius.This is explained as our proposed scheme and algorithm [23]apply the power control scheme.However,for other referenced algorithm in [26]and the random allocation without power control scheme,since UE always adopts the maximum power,the energy efficiency performance slightly increases with the increase of total throughput.It can be recognized that the energy efficiency performance of our proposed scheme is better than other referenced algorithm[23]with the increase of the D2D cluster radius.This is explained as compared with the algorithm[23],our proposed resource allocation method minimizes the interference from CUs to co-channel D2D receivers,which improves the channel gain of D2D links.

Figure5.The total energy efficiency of D2D multicast cluster versus D2D cluster radius.

Figure6 illustrates the total average throughput of the D2D multicast cluster with the variation of the D2D multicast cluster radius.It can be observed that the total average throughput performance of all the schemes increases with the increase of the cluster radius.This is explained as with the increase of the D2D multicast cluster radius,the number of D2D multicast cluster in each cluster also grows.Moreover,it can be discerned that the total throughput performance of our proposed scheme is initially inferior to that of the algorithm [23]and algorithm [26]and gradually outperforms algorithm [7]with the rise of D2D group radius.The reason is provided as follows.Initially,after D2D multicast clusters have been formed,our proposed energy-aware resource and power allocation scheme aims at maximizing the energy efficiency performance of the D2D groups,which adopts a smaller transmit power when the radius of the D2D multicast group is small.With the increase of the D2D cluster radius,to ensure reliable D2D multicast transmission,our proposed scheme and algorithm [23]will gradually increase the transmit power of D2D CHs while decrease the transmit power of co-channel CUs,which leads to increased total throughput.With the increase of D2D cluster radius,since both CUs and D2D clusters in[7]always transmit at the maximum power,the interference between co-channel CUs and D2D clusters will also severely impact the channel quality of the D2D multicast link,which leads to a slow increase of total throughput than our proposed algorithm.

Figure7 demonstrates the average SINR of D2D multicast clusters with the variation of the D2D cluster radius.We assume that there are total 300 users randomly distributed in the cellular area.From Figure7,we can infer that for the schemes which include the power control scheme,such as our proposed algorithm and the scheme in [23],the SINR distribution does not decrease with the increase of D2D group radius.This is because according to the changing channel conditions and resource reuse relationships,the power control scheme jointly adjusts the transmission power of both D2D clusters and CUs,which ensures reliable D2D multicast transmission among each cluster.On the contrary,for the algorithms which don't incorporate power control schemes,such as[7,26],the QoS requirement of D2D multicast clusters in terms of minimum SINR constraint cannot be insured when the channel quality becomes worse.

Figure6.The total average throughput of D2D multicast clusters versus D2D cluster radius.

Figure7.The total average throughput of D2D multicast clusters versus D2D cluster radius.

Figure8 plots the total energy efficiency of D2D multicast clusters with the change of the number of users when different D2D radius is adopted.From Figure8,it can be observed that the total energy efficiency of D2D multicast clusters of all the algorithms increases with the increase of the number of users.This is explained as follows.There are more cluster members in the D2D multicast clusters with the increased number of total users.The sum data transmission capacity of total D2D multicast clusters will increase,which contributes to the increased energy efficiency of each D2D multicast cluster.It can also be observed that when the total number of users remains the same,the total energy efficiency of D2D multicast clusters for the same algorithm decreases with the increase of the D2D cluster radius,which follows the results revealed by Figure5.

Figure8.The total energy efficiency of D2D multicast clusters versus the number of users.

VI.CONCLUSION

In this paper,we propose a hybrid intelligent D2D multicast clustering strategy and a Q-Learning assisted joint resource allocation and power control algorithm for D2D multicast communications underlaying cellular networks.Concerning user preference and reliable D2D multicast communication between D2D multicast users,an AHCA-based clustering algorithm based on unsupervised machine learning is proposed to dynamically form D2D multicast clusters.Then,the energy efficient resource allocation problem with QoS constraints are studied,to maximize the overall energy efficiency of D2D multicast clusters.For solving the joint resource and power allocation problem,the original non-convex optimization problem is transformed into a mixed integer programming problem.We present a joint resource allocation and power control algorithm based on Q-Learning and Lagrange dual decomposition.Numerical simulation results demonstrate that the proposed scheme has clear benefits in terms of the total energy efficiency and throughput of the D2D multicast system compared to conventional algorithms.

ACKNOWLEDGEMENT

This research was supported by the National Natural Science Foundation of China (Grant Nos.62071377,61801382,61901367);the Key Project of Natural Science Foundation of Shaanxi Province (Grant No.2019JZ-06);the Key Industrial Chain Project of Shaanxi Province(Grant No.2019ZDLGY07-06);the College Science and Technology Innovation Activity Project of Xi'an University of Posts and Telecommunications(Grant No.19-B-289).

China Communications2021年1期

China Communications的其它文章: An Improved Equalization with Real Interference Prediction Scheme of the FBMC/OQAM System; Digital Signature Based on ISRSAC; Energy Efficiency Maximization Strategy for Sink Node in SWIPT-Enabled Sensor-Cloud Based on Optimal Stopping Rules; MimicCloudSim:An Environment for Modeling and Simulation of Mimic Cloud Service; Edge Coloring of Graphs with Applications in Coding Theory; Individual Identification of Electronic Equipment Based on Electromagnetic Fingerprint Characteristics