RIS-Assisted Federated Learning in Multi-Cell Wireless Networks

2023-05-08 06:13WANGYijiWENDingzhuMAOYijieSHIYuanming
ZTE Communications 2023年1期

WANG Yiji,WEN Dingzhu,MAO Yijie,SHI Yuanming

(ShanghaiTech University,Shanghai 201210,China)

Abstract: Over-the-air computation (AirComp) based federated learning (FL) has been a promising technique for distilling artificial intelli‐gence (AI) at the network edge.However,the performance of AirComp-based FL is decided by the device with the lowest channel gain due to the signal alignment property.More importantly,most existing work focuses on a single-cell scenario,where inter-cell interference is ignored.To overcome these shortages,a reconfigurable intelligent surface (RIS)-assisted AirComp-based FL system is proposed for multi-cell net‐works,where a RIS is used for enhancing the poor user signal caused by channel fading,especially for the device at the cell edge,and reduc‐ing inter-cell interference.The convergence of FL in the proposed system is first analyzed and the optimality gap for FL is derived.To mini‐mize the optimality gap,we formulate a joint uplink and downlink optimization problem.The formulated problem is then divided into two separable nonconvex subproblems.Following the successive convex approximation (SCA) method,we first approximate the nonconvex term to a linear form,and then alternately optimize the beamforming vector and phase-shift matrix for each cell.Simulation results demonstrate the advantages of deploying a RIS in multi-cell networks and our proposed system significantly improves the performance of FL.

Keywords: federated learning (FL);reconfigurable intelligent surface (RIS);over-the-air computation (AirComp);multi-cell networks

1 Introduction

With the development of the Internet of Things (IoT) and wireless technologies,recent years have witnessed an explosion of IoT devices and mobile data,which is of great significance for training AI models to enable various kinds of intelligent applications,such as auto-driving vehicles,equipment condition monitor‐ing,and smart cities[1–2].However,conventional methods that upload massive distributed data to a cloud encounter huge communication overhead and violate data privacy.To overcome these problems,federated learning (FL) emerges as a promising solution,where a shared AI model is trained among multiple devices without raw data transmission[3–6].Specifically,there are three steps in each training iteration of FL.First,a central server generates an initial global model and then broadcasts the global model to the edge devices cov‐ered by it.Then,each edge device performs one or more steps of local training based on the received global model and local dataset to calculate a local model or gradient vector and uploads it to the central server.Finally,the central server aggregates all local information and updates the global model for the next communication round.

One main research direction of FL is to overcome the com‐munication bottleneck caused by frequent transmission of the high dimensional model and gradient vectors.To combat the influence of wireless communications,the authors in Ref.[7] proposed a joint learning and communication framework to minimize the FL loss function.Partial device participation ap‐proaches,such as random scheduling and proportional fair‐ness,have been proposed for the rational allocation of limited communication resources in FL[8].To improve the communica‐tion efficiency of the FL uplink model aggregation,an overthe-air computation (AirComp) technique based on the wave‐form superposition characteristics of the multiple access chan‐nels (MACs) was proposed in Refs.[9–13],which realizes the summation calculation of the receiver function during informa‐tion transmission.To overcome the bottleneck of limited com‐munication bandwidth in the aggregation process,the authors in Ref.[14] presented a fast model aggregation method to im‐prove the performance of FL by jointly optimizing beamform‐ing vectors and device selection.In Ref.[15],a federated zeroth-order optimization (FedZO) algorithm based on Air‐Comp was proposed to enable communication-efficient trans‐mission by performing multiple local updates and partial de‐vice participation.Compared with the orthogonal multiple ac‐cess (OMA) method,where the information of other users is re‐garded as interference,and the summation of all signals is then calculated,i.e.,computing after communication,Air‐Comp greatly improves communication efficiency.The ben‐efits of AirComp-based FL have motivated its application in the unmanned aerial vehicle (UAV)[16–17]and reconfigurable intelligent surface (RIS)-enabled networks[18–23].

The schemes mentioned above cannot solve the essential problem that wireless channel fading leads to poor signal strength of many devices,especially for AirComp-based FL,whose performance generally depends on the worst device in the network.To mitigate the effects of wireless channel fad‐ing,RIS is recognized as a revolutionary technology that achieves high spectrum and energy efficiency by reconfigur‐ing the wireless channel environment at a low cost[24–27].The authors in Ref.[25] designed a RIS-assisted AirComp system to increase the performance of AirComp by optimizing the transceivers and RIS phase-shift.It was shown in Refs.[19–20] that configuring RISs in AirComp-based FL further re‐duced the error of model aggregation,thereby improving the learning performance.Considering the low latency and privacy-secure nature of FL,a differentially private FL sys‐tem via RIS was proposed in Ref.[12] to achieve a better tradeoff between the learning performance and privacy under the constraints of privacy and power.In order to further re‐duce the aggregation error,a multi-RIS scenario was pre‐sented in Ref.[28],where both the base station and the user used one dedicated RIS to mitigate the effects of poor chan‐nels.However,all the aforementioned works are limited to a single-cell setting.In fact,considering a multi-cell scenario is more in line with practical large-scale network de‐sign[29–31].Due to the serious fading of the signal received by users at the cell edge,deploying RISs can relay the intended signal to enhance signal strengths for edge users and expand network coverage in multi-cell scenarios[31–33].Besides,the authors in Refs.[30] and [34] proved that deploying a RIS at the cell edge can achieve the highest performance gain com‐pared with other RIS deployments.Most of the existing RISassisted multi-cell networks focus on communication-only system models,ignoring the application of FL.Although the multi-cell FL interference management was considered in Ref.[29],RIS was not considered to enhance the perfor‐mance of FL.To the best of our knowledge,this is the first work that investigates AirComp-based FL in RIS-assisted multi-cell networks.

In this paper,we investigate a RIS-assisted AirComp-based FL system in multi-cell networks,where a RIS is deployed at the cell edge to help each cell complete different FL tasks.In the process of FL,we consider both the impact of downlink and uplink communications.For the fast aggregation of uplink gradients,we adopt AirComp to improve communication effi‐ciency.However,the performance of AirComp-based FL is de‐pendent on the device with the worst link gain (e.g.,the celledge device with a large path loss).Besides,the inter-cell in‐terference also degrades its performance.To address these is‐sues,we further deploy a RIS at the cell edge to enhance sig‐nal strength and mitigate inter-cell interference,thereby im‐proving the FL performance.In our proposed system,there are some difficulties that we need to highlight.First,we consider both the impact of downlink model dissemination and that of uplink gradient aggregation,both are inevitably affected by channel fading,noise and inter-cell interference.It is different from most FL works,i.e.,only uplink aggregation errors are considered.Second,considering the downlink influence makes the convergence analysis of our system more compli‐cated.This derivation result is related to noise and inter-cell interference.Third,the optimization problems are non-convex and complex.We have to jointly optimize the beamforming vector and phase shift to improve the performance of our pro‐posed system.The main contributions of this paper are sum‐marized as follows:

• We propose a RIS-assisted AirComp-based FL system in two-cell networks,where a RIS is used for enhancing the sig‐nal of cell-edge devices during the process of both downlink and uplink transmission as well as for canceling the inter-cell interference.Then,we derive the convergence analysis of the proposed framework.The optimal gap of FL is determined by the uplink error and the downlink error of two cells,and each error contains channel fading,inter-cell interference and re‐ceived noise.

• To maximize the learning performance for all cells,it is necessary to minimize the optimal gap.To this end,we de‐couple this optimization problem into two separate subprob‐lems,respectively for the downlink and uplink optimization.Each subproblem requires a joint alternating optimization of beamforming vectors and phase-shift matrices.Since the opti‐mization subproblems remain nonconvex,we first make a vari‐able conversion and then utilize the successive convex ap‐proximation (SCA) method to approximate the problem.An al‐ternative optimization algorithm is then proposed to solve each subproblem.

• Extensive simulations are performed to verify the perfor‐mance of the proposed RIS-assisted FL system in two-cell networks.It shows that the proposed scheme can enhance the performance of the AirComp-based FL system by enhancing the signal strength and suppressing the inter-cell interfer‐ence.In addition,the proposed algorithm guarantees fairness among cells.

The rest of this paper is organized as follows.Section 2 in‐troduces the system model of RIS-assisted AirComp-based FL in a two-cell scenario.Section 3 provides the convergence analysis and the problem formulation.In Section 4,we pro‐pose an SCA-based joint alternating beamforming and phaseshift matrix optimization to minimize the upper bound of all cells.Simulation results are provided in Section 5 to support the advantages of the proposed system.Finally,we conclude this work in Section 6.

2 System Model

2.1 Network Model

As shown in Fig.1,we mainly develop a RIS-assisted AirComp-based FL system in a two-cell network,where each cell hasKsingle-antenna edge devices and one access point (AP),where each AP is equipped withNantennas.At the edge of two cells,we deploy a RIS to enhance the signal strength of edge devices,where the RIS hasSpassive reflect‐ing elements.Edge devicek∈Kl={1,2,…,K} is associ‐ated with APl∈L={1,2} to complete information exchange under both downlink and uplink communications,whereKl∩Kj=∅,∀l≠jandl,j∈L.During the process of trans‐mission,we assume that each AP knows the channel state in‐formation for all edge devices.

2.2 Federated Learning Model

▲Figure 1.RIS-assisted AirComp-based FL system in a two-cell network

Algorithm 1:FedSGD

In the proposed two-cell system,we assume that these steps are synchronous in both cells and their gradient information is uploaded to the AP.The synchronization can be enabled by AirShare[35],which transmits the clock over the air and pro‐vides a distributed protocol.In the next section,we elaborate on the communication process of the proposed system follow‐ing the procedure of FL.

2.3 Downlink Communication for RIS-Assisted FL System

From the perspective of communication,we utilize the uni‐versal frequency reuse technique to improve spectral effi‐ciency.In other words,the two cells share the same frequency during both downlink and uplink communications,inevitably causing inter-cell interference.

Considering a round of downlink communications in celll,APlshares the global model with each edge device in celll.However,in most of the existing works on FL,the process of broadcast is error-free,which indicates the edge devicek∈Klcan accurately receive signals from APl.In this subsection,we consider the effects of noise and inter-cell interference in downlink communications.Here,we omit the time index and denote the downlink transmitted signal from APlto the edge devicekaswl.In addition,we assumewlfollows the standard Gaussian distribution,i.e.,wl∼CN(0,1).However,the trans‐mitted signals may go through poor channel conditions in the communication process,which results in a larger receive error at edge devicek.To lift the accuracy of the received signal,we deploy a RIS to mitigate the distortion of signals.

wherewl,k,wlandare all vectors of dimensiond.After re‐ceiving the global modelwl,k,all edge devices start training based on the local data and then generate new local model pa‐rameters.The gradient information is the difference between the global model and the local model as in Eq.(4).After that,all edge devices upload their gradient information to APlthrough the uplink communication.

2.4 Uplink AirComp Aggregation for RIS-Assisted FL System

In uplink communications,since the average sum in Eq.(5) for gradient aggregation is included in the category of nomo‐graphic functions,AirComp,as a promising technique,takes advantage of the waveform superposition properties of MACs in wireless networks to improve transmission efficiency.Fig.2 shows the process of AirComp.For the sake of briefness,we also omit the time index in the following presentation.The transmitted signal and pre-processing function of the (l,k)-th edge device are denoted byxl,k∈Candψl.k()⋅:C→C,re‐spectively.The target function processed at thel-th AP is given by

Similar to the downlink communication,we letθu=represent the diagonal phase-shift matrix of the RIS in the uplink communication andΘu=diag(θu)with∈[0,2π].APlmainly aggregates three types of signals,namely,the signal of celll,the interference signal of other cells,and noise,where the first two items both contain the signal from the edge devices to APland the signal from the edge devices to RIS and to APl.Thus,the received signal at APlis given by

3 Convergence Analysis and Problem For⁃mulation

In this section,we provide the convergence analysis of the proposed RIS-Assisted AirComp-based two-cell FL system.Based on the convergence results,we get an optimality gap bound that is influenced by both the downlink and uplink er‐rors.In addition,we formulate the optimization problem to im‐prove the performance of the proposed system.

3.1 Convergence Results

Assumption 1:M-Smoothness.All local loss functions(F1,…,Fk) areM-Smoothness.For allxandy,we have

▲Figure 2.Process of over-the-air computation (AirComp) in the two-cell network

3.2 Problem Formulation

According to Theorem 1,the first term to the right of the in‐equality gradually tends to zero as the number ofTincreases.Thus,the upper bound is dominated by the last term,which in‐cludes the inter-cell interference and noise error in the down‐link and uplink communications.we aim to minimize the up‐per bound in each time slot for transmitting the gradient infor‐mation in all cells,given by

For Problem (20),the optimization variables are the re‐ceived beamforming vectorm,uplink phase-shift matrixΘu,transmit beamforming vectort,and downlink phase-shift ma‐trixΘd.The first two correspond to variables in the uplink pro‐cess,and the last two are variables in the downlink process.We observe that the variables in these two processes are not coupled and their corresponding constraints are independent.Therefore,we can decompose the optimization objective into two sub-problems,i.e.,downlink and uplink optimizations.Then,we can further solve Problem (20) by minimizing the fol‐lowing two sub-problems in Eqs.(23) and (24) simultaneously.

4 Optimization Framework

In this section,we specify the optimization framework for solving the uplink and downlink optimization problems,re‐spectively.Each optimization problem also includes both beamforming optimization and phase-shift optimization.

4.1 Uplink Optimization

where,andare thet-th iteration solution.For Prob‐lem (31),the objective function and all constraints are convex,which indicates the optimal solution can be obtained from a convex program.Since we have scaled down the phase-shift equation constraints,when we get the optimal phase-shift solu‐tion from the convex program,we need to normalize it to sat‐isfy the equation constraint.

The framework of optimization is summarized in Algorithm 2,where the process of solving Problems (29) and (31) is based on the SCA algorithm.For the equation constraint,we first relax it to obtain the optimal solution and then normalize the solu‐tion to satisfy the original condition.

Algorithm 2: Alternative beamforming and phase⁃shift al⁃gorithm

4.2 Downlink Optimization

The downlink optimization problem is

Problem (36) is in the same form as Problem (30),which means we can use the same strategy to solve the downlink phase-shift optimization.

5 Simulation Results

In this section,we provide some important simulation re‐sults to demonstrate the performance of the proposed RISassisted multi-cell FL network.

5.1 Experiment Setup

We consider a RIS-assisted two-cell wireless FL network in two-dimensional space where the coordinates of the APs are (0,0) and (200,0).The RIS is deployed at the edge of the two cells,i.e.,(100,0).The edge devices of each cell are ran‐domly scattered within a circle with a center of (90,0) or (100,0) and a radius of 10 m.We assume that the antennas of the APs and the reflecting elements of the RIS are both ar‐ranged in a uniform linear array.In the experiments,the path loss is modeled asT(d∕d0)-αat a distance ofd0=1 m,whereddenotes the link distance andαis the pass loss exponent.We consider Rician fading for all channels and the channel coefficients are given as

whereandrepresent the line-of-sight (LoS) and nonline-of-sight (NLoS) components.The Rician factorβis set to be 3.Particularly,we consider the same path loss exponent for all links,which is 2.2.Besides,we setPd=30 dBm,and=σ2=-10 dBm,which means the constantq=1.

In this paper,we adopt the sample-wise loss function and Modified National Institute of Standards and Technology (MNIST) datasets[36]in the process of learning.We assume that each cell performs a different learning task (0-4 in Cell 1 and 5-9 in Cell 2) and that the learning rate is 0.1.The minibatch datasets at different cells are 12 and 16,respectively.Next,we make the following specific schemes to compare the performance:

1) Without RIS: This scheme does not consider the RIS,which indicates the channel only contains the direct link be‐tween the APs and devices,i.e.,Θ=0 (for both downlink and uplink communications).

2) Random phase-shift: Under this scheme,the phase-shift matrix is randomly generated in a RIS-assisted system,that is,we only need to optimize the beamforming vectors.

3) Optimal phase-shift: Under such a scheme,we optimize both the beamforming vectors and the phase-shift matrix of the RIS (Algorithm 2).

4) Error-free: The scheme is the benchmark of FL,which implies both the downlink model dissemination and uplink gradient aggregation are transmitted in an error-free manner.

5.2 Performance of RIS-Assisted FL Two-Cell System

In this subsection,we first present the performance of the uplink aggregation based on AirComp and downlink dissemi‐nation error.Then we compare the performance of a two-cell FL system under different schemes.

For the uplink aggregation,the mean-square error (MSE) is a very common performance metric in AirComp[12,14,25,34].Therefore,we discuss the impact of the number of users,the number of antennas at each AP,and the number of re‐flecting elements at RIS on the average MSE across all cells.Fig.3 displays the relationship between the MSE and the number of users,where the number of antennas at AP and the number of elements at RIS are set to beN1=N2=10 andS=30,respectively.It is obvious that the MSE in‐creases with the number of users and deploying the RIS can significantly reduce the value of MSE compared to the ab‐sence of the RIS.This is because RIS can perform channel compensation for users at the edge of the corresponding cells with poor signals.On the one hand,with the increase of users,the inter-cell interference is more obvious,which also enlarges the MSE.On the other hand,when a RIS is deployed at the edge of two cells,it can mitigate inter-cell interference.Besides,the RIS with optimal phase-shift is better than that RIS with random phase-shift on MSE,which indicates that the RIS with optimal phase-shift signifi‐cantly enhances the signal strengths received at the APs.Fig.4 compares the effects of the different numbers of an‐tennas at AP on MSE,where the number of users per cell is fixed to 10 and the number of elements at RIS is also 30.We observe that the MSE decreases with the number of antennas,due to the diversity gain of antennas.RIS can im‐prove the total MSE performance of the two-cell system.Correspondingly,the RIS with optimal phase shift can also achieve better MSE performance than the other two baseline schemes.

▲Figure 3.Relationship between MSE and the number of users

▲Figure 4.Effects of the number of antennas on MSE

To compare the effect of the number of RIS elements on MSE,we first set the number of users and antennas at AP to 10,i.e.,N1=N2=K1=K2=10,and then we fix the location of users in each cell to avoid the influence of channel random‐ness.Fig.5 shows that the number of elements at a RIS has a positive tendency correlated with the MSE,and as a result,the performance gradually gets better as the number of elements increases.In addition,the gap between random phase-shift and optimal phase-shift becomes larger and larger as the num‐ber of elements increases,which demonstrates the benefits of the optimal phase-shift scheme.

Since the downlink optimization and the uplink optimiza‐tion have similar forms and are solved by the same algorithm,the impacts of the number of users and antennas at AP and the elements at the RIS on the downlink MSE have the same performance trend as those on the uplink MSE.We further compare the downlink errors in the case thatK1=K2=10,N1=N2=10,andS=30,i.e.,The results are shown in Table 1.

According to the results,the RIS with optimal phase shift still achieves the best performance,despite the small gaps in these errors.Moreover,we observe that the downlink error is much smaller than the uplink MSE,which indicates the down‐link error has little effect on the convergence result of the over‐all system when the number of users is relatively small andM=10 (the learning rate isζ=0.1).

▲Figure 5.Relationship between MSE and the number of elements at RIS

▼Table 1.Comparison of downlink errors

Next,we compare the performance of these schemes in the proposed two-cell FL system,where the number of users and that of antennas at AP in each cell are 5,and the number of el‐ements at RIS is set by 15.Each cell performs the same FL task with one local update in different mini-batch datasets.In order to compare the performance of the entire system,we av‐erage the train loss and test accuracy of the two cells and the results are shown in Fig.6.Fig.6 (a) shows,although the train‐ing loss of these schemes varies,all the schemes can achieve convergence and converge fast.Based on the proposed schemes,the RIS with optimal phase-shift scheme can demon‐strate its advantages to enhance the performance of FL.From Fig.6 (b),we notice that the RIS with optimal phase shift can achieve approximately 85% accuracy,the RIS with random phase-shift can get 83.5% accuracy,and the scheme without RIS only attains 82.7% accuracy,which proves that the RISassisted schemes can improve the performance of FL.To clearly show the effectiveness of our proposed system,we make additional time statistics for each scheme and each scheme runs for almost 800 s underK=5,M=15,N=5,andT=300,indicating that the proposed system can converge quickly.In summary,RIS can compensate for the signal degradation of edge users and thereby decrease the error of communication.Moreover,we can adjust the phase-shift matrix of RIS to miti‐gate the inter-cell interference.

▲Figure 6.Performance of different schemes in the proposed two-cell FL system: (a) training loss vs communication rounds;(b) test accuracy vs communication rounds

6 Conclusions and Future Work

In this paper,we develop a RIS-assisted AirComp-based two-cell FL wireless network,where each cell learns a differ‐ent FL task and both the effects of downlink and uplink com‐munications are considered.We first analyze the convergence of FL in the proposed system and show that the convergence is mainly influenced by the error of downlink and uplink trans‐missions.To enhance the performance of FL,we formulate the joint uplink and downlink optimization problem to minimize the optimality gap.To solve the problem,we divide the optimi‐zation problem into two separate subproblems.The beamform‐ing vector and phase-shift matrix in each subproblem are opti‐mized by alternative optimization based on SCA.In the end,simulation results show the performance and advantage of our proposed system and optimization algorithm.

In this work,we mainly focus on a scenario where a RIS as‐sists two cells.In our future work,we will consider the sce‐nario of a multi-RIS-assisted multi-cell wireless network,which makes the system model more complex.Since the place‐ment of multi-RIS has a great impact on multi-cell perfor‐mance,it is necessary to improve the average learning perfor‐mance of all cells,as well as to avoid the poor performance of one cell.Most existing RISes only support a reflection or trans‐mission mode.A new simultaneous transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) can achieve full spatial coverage and have the advantage of adjusting more de‐grees of freedom.Therefore,promoting the deployment of SART-RIS is conducive to the implementation of more appli‐cation scenarios.

Appendix

Proof of Theorem 1

For presentation clarity,we omit the cell index in the follow‐ing analysis.According to Eqs.(5),(7) and (14),we have