Ning Huang,Tianshun Wang,Yuan Wu,3*,Suzhi Bi,Liping Qian,Bin Lin
1 State Key Laboratory of Internet of Things for Smart City,University of Macau,Macau,China
2 Department of Computer and Information Science,University of Macau,Macau,China
3 Zhuhai-UM Science and Technology Research Institute,Zhuhai 519031,China
4 College of Electronics and Information Engineering,Shenzhen University,Shenzhen 518060,China
5 College of Information Engineering,Zhejiang University of Technology,Hangzhou 310014,China
6 Department of Communication Engineering,Dalian Maritime University,Dalian 116026,China
Abstract:Federated learning(FL),which allows multiple mobile devices to cooperatively train a machine learning model without sharing their data with the central server,has received widespread attention.However,the process of FL involves frequent communications between the server and mobile devices,which incurs a long latency.Intelligent reflecting surface(IRS)provides a promising technology to address this issue,thanks to its capacity to reconfigure the wireless propagation environment.In this paper,we exploit the advantage of IRS to reduce the latency of FL.Specifically,we formulate a latency minimization problem for the IRS assisted FL system,by optimizing the communication resource allocations including the devices’ transmit-powers,the uploading time,the downloading time,the multi-user decomposition matrix and the phase shift matrix of IRS.To solve this non-convex problem,we propose an efficient algorithm which is based on the Block Coordinate Descent(BCD)and the penalty difference of convex(DC)algorithm to compute the solution.Numerical results are provided to validate the efficiency of our proposed algorithm and demonstrate the benefit of deploying IRS for reducing the latency of FL.In particular,the results show that our algorithm can outperform the baseline of Majorization-Minimization(MM)algorithm with the fixed transmit-power by up to 30%.
Keywords:federated learning;intelligent reflecting surface;latency minimization
The explosive growth in deep learning(DL)with its applications in various areas such as image recognition,6G communication[1],big data analysis[2],Internet of Things[3]and network resource management[4],has yielded a significant computation workload for training the DL model.Conventionally,training data generated by various kinds of sensors is sent to a central server for training the neural network,which,however,leads to the issue of disclosing users’private raw data.
Federated learning(FL),which allows multiple mobile devices to cooperatively train a DL model without sharing their data with the central server,has received widespread attention.FL has been applied to many applications,such as mobile edge network[5],vehicular Internet of Things[6],drone networks[7]and collaborative anomaly detection[8].Different from the centralized model training,each device only needs to upload its locally trained model to the server for the model aggregation.However,FL requires frequent communications between the central server and the mobile devices,which incurs a significant latency in the training process.The recent advanced intelligent reflecting surface(IRS)provides a promising approach for addressing this issue.In this paper,we exploit IRS with the objective of reducing the latency of FL.Our detailed contributions in this work are summarized as follows.
(1)By exploiting IRS,we formulate a latency minimization problem that jointly optimizes the transmitpowers of the mobile devices,the uploading transmission time,the downloading transmission time,the multi-user decomposition matrix and the phase shift matrix of IRS,with the objective of minimizing the latency for FL convergence.
(2)To address the non-convexity of our formulated joint optimization problem,we propose an algorithm which is based on the block coordinate descent(BCD)algorithm[9]and the penalty difference of convex(DC)algorithm[10].Specifically,with BCD,we firstly derive the analytical solution for the multiuser decomposition(MUD)matrix,and then optimize the other variables including the uploading time,the downloading time,devices’ transmit-powers,and the phase shift matrix of IRS.For optimizing these variables,we transform the associated problem into a DC problem with DC constraints,and propose a penalty DC based algorithm for solving it.
(3)We conduct the numerical evaluations to validate our proposed algorithm.We analyze the convergence of our proposed algorithm,and demonstrate the latency performance of our IRS assisted FL in comparison with some benchmarks under different scenarios.In particular,the results show that our algorithm can outperform the baseline of Majorization-Minimization(MM)algorithm with the fixed transmit-power by up to 30%.
The remainder of this paper is organized as follows.Section II reviews the literatures,and Section III presents our problem formulation.We propose the algorithm to solve the joint optimization problem in Section IV and show the simulation results in Section V.Section VI finally concludes our work and discusses the future directions.
FL,which allows numerous participants to cooperatively train a common model without revealing the private data,has developed rapidly.[11][12]review the challenges of FL,including how to reduce the total latency and communication cost.In particular,there have been many studies investigating how to reduce the latency for FL,which can be categorized as follows:
(1)(Optimal resource allocation):The authors in[13]design a novel resource allocation and scheduling scheme,which is based on channel conditions and the degree of local model updates.[14]analyzes the convergence of FL,based on which the authors jointly optimize the communication resource and learning parameters.In[15],the authors formulate an optimization problem of the user selection and resource block allocation scheme under the constraint of limited wireless bandwidth,by considering the influence of the packet errors on the quality of training.The authors in[16]consider the reusing of the spectrum resources on the unlicensed spectrum,and develop a FL based approach to improve system stability and performance.[17]proposes a non-orthogonal multiple access assisted FL with both energy-consumption optimization and secrecy provisioning.
(2)(Efficient model aggregation):[18]proposes a novel fast global model aggregation approach by combining FL with over-the-air computation,which explores the superposition property of a wireless multiple-access channel.[19]studies the broadband analog aggregation and designs a low-latency multiaccess scheme for edge learning.
(3)(Optimal client selection in FL):Considering partial participation,[20]proposes an unbiased selection scheme which chooses the node with a small value of loss function.As non-iid(non identically and independently distributed)data causes the accuracy degradation of FL,[21]defines the weight divergence to represent the degree of non-iid and selects the participants with lower weight divergence.[22]studies client selection and bandwidth allocation in wireless federated learning networks and brings a new longterm perspective to resource allocation.The authors in[23]propose an edge computing-based client selection and networking scheme for vehicular IoT,considering the tradeoff between the accuracy of global model and communication overhead of FL.
Viewing the great potentials of IRS,the recent years have witnessed many research efforts focusing on IRS assisted wireless systems,e.g.,orthogonal frequency division multiplexing[24],vehicular network[25],and integrated terrestrial-satellite networks[26].
There have been several recent studies exploiting IRS for FL.[27–30]deploy IRS in FL based on overthe-air computation,the main focus of which is to optimize the model aggregation with the assistant of IRS.[31]studies an IRS assisted high speed mmWave communication system,by utilizing FL to train the model for the mapping between the IRS configuration of optimal reflecting angles and the channel conditions.However,it is still an open question regarding how to exploit IRS for both uplink and downlink model transmissions in FL with the objective of minimizing the latency of FL.
We consider the IRS assisted FL system in a multiple input single output scenario including one base station(BS)withMantennas andKmobile devices with one single antenna.The BS is equipped with an edge server for model aggregation.The IRS is constructed byNreflecting elements.Since IRS is made of low-cost passive scattering elements embedded in the metasurfaces which need be attached to the facades of buildings,indoor walls,and ceilings,etc,the main cost of IRS comes from the installation expense in practice.
We denote the channel between thek-th mobile device and the BS as,the channel between thek-th mobile device and the IRS as,and the channel between the IRS and the BS as.We assume that the channel coefficients are quasi-static such that they can be regarded constant during the FL iterations.For IRS,we denote the phase shift at then-th reflecting element asθn ∈[0,2π).Then,we constitute a diagonal matrix aswherejrepresents the imaginary unit of the complex variable.
We denote the number of local iterations asτand the number of global iterations asµ.The objective of FL is to minimize the loss function over the whole dataset,and each round of FL iteration includes the following four steps.
Step(1): The BS broadcasts the global model parameter to the mobile devices.We denote the transmission time for the BS to broadcast the data to the mobile devices astD,the transmit-power of the BS aspD,and the bandwidth asB.The transmit-signal at the BS is xD=fςD,whereςDis the symbol followingrepresents the precoding vector which is assumed to be fixed for the sake of simplify.
The received signal at thek-th device is
wherenkis the additive white Gaussian noise at thek-th device with zero mean and variance.Then the maximum downloading rate for thek-th device is
We denote the size of the global model assD(in the unit of bits),and the downloading time astD.Then,the following constraint should hold regarding the BS’s broadcasting of the global model,
i.e.,the BS should deliver the model data to each device within the time durationtD.
Step(2):The mobile devices train their own models and update the model parameters.Based on the received global model parameters,each device updates its own model with its own dataset.We denotefkas the CPU frequency at thek-th mobile device,with the unit of HZ which means the number of CPU cycles per second.ParameterDkis the bits of data samples on thek-th device.As a result,thek-th device’s computation time for the local training is
whereζkrepresents the number of CPU cycles for thek-th device to process one bit of sample data.Here,we adopt the full batch method for local updating,which can provide a good performance of training accuracy[32].
According to[33],the CPU power consumption per second can be expressed as,whereκis a coefficient depending on the CPU architecture.Forτlocal iterations,the computing energy consumption at thek-th device in each round is
Algorithm 1.BCD algorithm to solve Problem P0.1: Initialize i=0,initialize T0,T1,{pUk}0, Φ0,ε.2: while|Ti+1-Ti|>ε do 3: With{{pUk}, Φ}i,compute Wi+1 according to Theorem 1.4: With Wi+1,update {{pUk},Φ,tU,tD}i+1 as demonstrated in Section 4.1.5: Update i=i+1.6: Update Ti.7: end while
Although Problem P0-EXP is still a non-convex optimization,a keen observation is that it aims at finding the minimum value ofTsuch that its feasible region is non-empty.Based on the observation,we propose an efficient bisection-based algorithm for reaching the minimum value ofT.First,we fix the value ofT,then we check whether the feasible region of Problem P0-EXP is empty or not.If yes,we can decreaseT.Otherwise,we have to increaseT.Checking whether the feasible region is empty or not is equivalent to checking whether the minimum value of Problem P0-CHECK(which is provided below)reaches zero or not.
Algorithm 2.Bisection search algorithm to solve Problem P0-EXP.1: Initialize i = 0,initialize T0 with a large value,and set ϵ as a small positive number to denote the convergence condition.2: Tupper=T0,Tlower=0.3: while|Tupper-Tlower|>ε do 4: Given Ti,solve Problem P0-CHECK according to the next subsection,and use the corresponding optimal value of the objective function to set variable obj.5: if obj =0 then 6:Tupper=Ti.7: else 8:Tlower=Ti.9: end if 10: Set Ti+1=Tupper+Tlower 2.11: Update i=i+1.12: end while
The detailed algorithm is demonstrated in Algorithm 2.The key steps of Algorithm 2 lie in Step 4 to Step 9.In Step 4,we check whether the value of Problem P0-CHECK is equal to zero,which is equivalent to check whether the feasible region of Problem P0-EXP is empty.The reasons will be discussed later.Specifically,if the minimum value of Problem P0-CHECK is zero,then the corresponding solution of Problem P0-CHECK suffices to be in the feasible region.Otherwise(i.e.,the minimum value of Problem P0-CHECK is nonzero),there exists no feasible solution under the givenT.Then,in Step 5 to Step 9,we increase or decrease the value ofTdepending on the checked result of Step 4.
Now we explain why checking the feasible region is equivalent to checking the minimum value of Problem P0-CHECK.This comes from two reasons.First,for the positive semi-definite matrix V,the trace of matrix is the summation of all singular values of matrix V,while the spectral norm equals to the largest singular value of matrix V.Thus,there always exists Tr(V)-||V||2≥0,∀V.Second,we add a constraint(30)to guarantee the nonnegative value ofwhich can be considered as finding the minimum value in the nonnegative region,and we do not need to consider again the case of non-positive region0.If the value ofcan reach zero under the constraintthen it also can reach zero under the complement constraintHowever,if the minimum value ofcannot reach zero under the constraintthere does not exist any value of V such thatThen the maximum value ofcannot reach zero under the constraint.
C.Penalty DC Algorithm
To solve Problem P0-CHECK,we first construct the penalty function to deal with constraint(29)which is difference of convex.We introduce two additional variables to represent the two convex parts of constraint(29)as follows:
Algorithm 3.Penalty DC algorithm.1: Initialize{z1,z2,z3,{xkj},{yk}K,tD,V,Θ}0,i =0,ε,λ0 >0,obj1,obj0,where obj is the optimal value of Problem P0-DC.2: while |obji+1-obji|>ε or constraint(29)is not satisfied do 3: With {{xkj},{yk}K,V}i,solve Problem P0-DC by CVX,and take the optimal solution as{z1,z2,z3,{xkj},{yk}K,tD,V,Θ}i+1.4: Update obji+1 with the optimal value of the objective function of Problem P0-DC.5: ifConstraint(29)is satisfied for both cases of {{xkj},{yk}K,z1}i and{{xkj},{yk}K,z1}i+1,then 6:Update λi+1=λi.7: else 8:Update λi+1=λi+Δ.9: end if 10: Update i=i+1.11: end while
In Problem P0-PENAL,all the constraints are convex,and the objective function is difference of convex.It is a DC programming problem,which can be solved by DC algorithm[10].Our detailed algorithm combining the penalty method and DC algorithm is demonstrated as Algorithm 3.The key steps of of Algorithm 3 are explained as follows.In Step 5,we check whether the value of the penalty function is equal to zero or not.If yes,we keep the penalty coefficient unchanged.Otherwise,we increase the value of penalty coefficient by a fixed step size.We will illustrate later that the penalty coefficient is bounded.In Step 3,we solve Problem P0-DC by CVX,which is convex since we linearly approximate the concave part of Problem P0-PENAL.
The main idea of DC algorithm is to linearly approximate the concave part iteratively.At thei-th iteration,we solve Problem P0-DC as follows:
Problem P0-DC is a convex problem,which can be solved by several solvers for convex optimization problems,e.g.,CVX[36].After obtaining V,we can use singular value decomposition to get the value of the phase shift of IRS.
To solve the problem,we decompose it into two subproblems based on the principle of the BCD and solve them alternatively.We firstly derive the optimal MUD matrix in Theorem 1.Then,the other subproblem is transformed to a DC problem,which is solved by the penalty DC algorithm.For the penalty DC algorithm,the sequence of penalty coefficient{λi}is bounded[10],and the authors in[10]have proved that it achieves the global convergence.Our BCD-DC algorithm fits the scope of the unified framework for block successive upper-bound minimization[38],and thus can converge to a(locally)optimal point.
In this section,we evaluate the performance of the proposed IRS assisted FL system based on our algorithm.We consider a FL system with one BS and multiple mobile devices.We assume that the devices are randomly located in the circular region with the center point at(250,0)mand radius 50m.We also assume that the IRS is located at(100,0)m.The position of the BS is set as(0,0)m,and the distance between thek-th device and the BS isdk.For the channel coefficient,we consider both the large-scale path loss and the small-scale fading.Specifically,the large scale path loss with the unit dB corresponding to thek-th device is modeled aswhere PL0represents the path loss at the reference distanced0,andαis the path loss exponent.The small scale fading is independent identical distribution and follows the complex Gaussian distribution associated with zeros mean and unit variance.The detailed settings of the parameters are listed as follows,where rand(1,K)denotes an 1×Kvector with each element randomly generated according to the uniform distribution within[0,1].
(1)Basic settings: the number of antennas at BSM= 3,the number of reflecting elementsN= 10,the number of mobile devicesK= 5,the number of global iterationsµ= 280 and the number of local iterationsτ=10.
(2)Path loss settings:the path loss componentα=2,the reference path loss PL0=30 dB,the reference distanced0=1 m.
(3)Communication settings: the bandwidthB=1 MHZ,the minimum transmit-power at each devicethe maximum transmit-power at each devicethe size of the uploaded modelsU=3 Mbit,the size of downloaded modelsD=3 Mbit,and the variance of noiseσ2=σk2=2×10-16,∀k ∈K.
(4)Computing settings: the CPU frequency vector of mobile devices f=(1+rand(1,K))×108HZ,the effective switched capacitanceκ=10-28[33],the CPU cycles consumption per sampleζk=2×104,∀k ∈K,the energy budgets Emax=(2+rand(1,K))×104J,the amount of data samples at the deviceDk=(1+rand(1,K))×500,and the model aggregating time at the BSϑ=2 s.
Figure 2 shows the convergence of our proposed algorithm.Figure 2a shows the behavior of our bisection search algorithm as demonstrated in Algorithm 2 and our BCD algorithm as demonstrated in Algorithm 1.In Figure 2a,if the conditions demonstrated in Algorithm 2 can be satisfied,the value ofTdecreases.Otherwise,we have to increase the value ofT.Figure 2b demonstrates the behavior of our penalty DC algorithm,which includes two phases.We test two different values ofT,i.e.,T= 7500sandT= 3750s.In the first phase,the penalty coefficient increases,and in the second phase the value of the penalty coefficient keeps unchanged.At the first phase,the objective value maybe increases as well,as the penalty item is part of the objective function.At the second phase,the penalty coefficient remains unchanged,and the objective value is decreasing,since the value of the penalty item is zero.From Figure 2b,we can see that the sequence of penalty coefficient is bounded,which is consistent with the results in[10].Reminder that the penalty coefficient sometimes remains the initial value and does not increase if the constraint to be penalized can be satisfied.
Figure 1.IRS assisted FL.
Figure 2.Convergence of our proposed algorithm.
Figure 3.Impact of the number of IRS elements.
We compare our algorithm with three benchmark algorithms as demonstrated in Figure 3,including:i)the FL without using IRS(which is denoted by non-IRS),ii)IRS with random phase shift,and iii)IRS with MM algorithm[39].Specifically,IRS with random phase uses a random phase shift in IRS.IRS with MM algorithm utilizes MM algorithm to optimize the IRS phase shift in the uplink transmission under the fixed transmit-power.As demonstrated in Figure 3,our proposed scheme can achieve a much better performance than both the IRS with random phase scheme and non-IRS scheme.Moreover,our proposed optimal design of IRS assisted FL can also outperform the IRS with MM scheme,and the performance gain compared to the IRS with MM scheme can be up to 30%.
Figure 4 demonstrates the latency performance versus the number of mobile devices.The results show that the latency of FL is increasing for both the case with IRS and the case without IRS,which is because that the slower devices influence the latency of the system and the system performance is determined by the slowest device.It is observed that with IRS,the latency of FL system atK= 50 is almost twice that atK= 5.To address this quick increase in latency due to the large number of devices,we should increase the number of IRS reflecting elements as discussed previously.
Figure 4.Impact of the number of devices.
Figure 5.Impact of the amount of samples on devices.
Figure 5 illustrates the performance of FL versus the amount of data samples and the energy budget.We test different levels offor the devices,for instance,in Figure 5,the label of “5000” means that the devices’ energy-limits are generated according to 5000×(1+0.1×rand(1,K)).Figure 5 shows the following results.First,the latency increases when the amount of data samples increases,which is because more data samples take a longer training time.Second,the latency for FL convergence increases when the amount of the samples increases.Moreover,such an increase is more significant when the devices have a smaller energy budget.This result is reasonable,since the devices have to reduce their transmit-powers due to the limited energy budget,which thus results in a longer transmission latency.
In this paper,we aim at reducing the latency of FL by exploiting IRS.After modeling the four steps of FL including the model broadcasting,local training,uploading,and the model aggregating,we formulate a problem to minimize the latency for FL convergence by optimizing the communication resource allocations for IRS.To solve the problem,we propose an efficient algorithm to compute the optimal solution.Numerical results demonstrate the benefit of deploying the IRS in FL system under various parameter settings.With IRS,the latency of FL can be reduced effectively.In our future work,we will further investigate the optimal device selection for the IRS assisted FL,in which we jointly optimize the IRS resource allocation and the device selection for participating in the FL,with the objective of minimizing the overall latency.
This work was supported in part by National Natural Science Foundation of China under Grants 62122069,62072490,62071431,and 61871271,in part by Science and Technology Development Fund of Macau SAR under Grants 0060/2019/A1 and 0162/2019/A3,in part by FDCT-MOST Joint Project under Grant 0066/2019/AMJ,in part by the Intergovernmental International Cooperation in Science and Technology Innovation Program under Grant 2019YFE0111600,in part by FDCT SKL-IOTSC(UM)-2021-2023,in part by Zhejiang Provincial Natural Science Foundation of China under Grant LR17F010002,in part by the Shenzhen Science and Technology Program under Projects JCYJ20210324093011030 and JCYJ20190808120415286,and in part by Research Grant of University of Macau under Grants MYRG2020-00107-IOTSC and SRG2019-00168-IOTSC.
Proof of Theorem 1