A Universal Hybrid Precoding Scheme for Massive MIMO Communications

2022-11-18 07:58YimengFengYiJiangMaheshVaranasi
China Communications 2022年11期

Yimeng Feng,Yi Jiang,*,Mahesh K.Varanasi

1 Key Laboratory for Information Science of Electromagnetic Waves(MoE),School of Information Science and Technology,Fudan University,Shanghai 200433,China

2 the Department of Electrical,Computer and Energy Engineering,University of Colorado,Boulder,Colorado 80309-0425

Abstract: Based on an analog radio frequency (RF)network,hybrid precoding(HPC)for massive MIMO can achieve very high spectral efficiencies with moderate hardware cost and power consumption.Despite the extensive research efforts in recent years,the practioners are still looking for HPCs that are efficient and easy-to-implement.In this paper,we present a new method termed as the universal hybrid precoding(UHP),which is nearly optimal,computationally efficient,and applicable to various types of RF network(thus,the name universal): the components of the network can be phase shifters(with finite or infinite resolutions),switches,or their combinations; the topology of the network can be fully-connected or partiallyconnected.Besides the standard UHP,we also propose a simplified version termed as sUHP to trade a negligible performance loss for much reduced computational complexity.The analysis shows that the computational complexity of the proposed UHP/sUHP is one to two orders of magnitude lower than the state-of-theart methods.Simulation results verify the(near-)optimality of the proposed UHP scheme for various forms of the analog networks.

Keywords: hybrid beamforming; massive MIMO;phase-shifter network;switch network

I.INTRODUCTION

Massive multi-input multi-output (MIMO) systems have emerged as a key to the fifth-generation (5G)and beyond 5G(B5G)wireless communications[1,2].By employing a large number of antennas at both the transmitter and the receiver,it is well-known that massive MIMO systems can achieve much higher spectral efficiencies than traditional MIMO systems.However,while all-digital beamforming schemes can fully harness the high spectral efficiencies,they require as many radio frequency (RF) chains as the number of antennas,which leads to formidably high cost of hardware and high power consumption.Hence,it is essential to design cost-effective hardware with much fewer RF chains.The key is to insert an analog network between the antenna array and the RF chains.

To accommodate for the hardware constraint of massive MIMO systems,hybrid precoding(HPC)has been intensively researched in recent years(cf.[3]and the references therein).A HPC consists of two parts:a digital precoder that transforms the transmit signals at baseband,and an analog precoder that maps the small number of RF analog signals to a large number of antennas via the analog RF network.The network can either be fully-connected [4–16],partially-connected[6,17–22],or of multiple beamforming weights from a codebook[23–26].For the fully-connected network,each RF chain is connected to all the antennas,but for the partially-connected network,each RF chain is only connected with a subset of antennas,which significantly simplifies the hardware implementation but sacrifices some beamforming gain.The components of these analog RF networks can be phase shifters(with finite or infinite resolutions),switches or their combinations.

Different analog RF networks can achieve different trade-offs between the hardware/software complexity and the spectral efficiency performance.In this paper we focus on improving the trade-off by tackling the HPC problem from a perspective of matrix computation,and propose a unified hybrid precoding (UHP)algorithm for fully-and partially-connected RF networks to maximize the spectral efficiency but with significantly smaller (one to two orders of magnitude lower)computational complexity than the state-of-theart methods[7].

1.1 Related Works

A multitude of algorithms have been proposed in the literature that seek to optimize the analog beamforming/precoding weights for the fully-connected RF network.For instance,the authors of[4]indicate that the spectral efficiency can be approximately maximized by minimizing the Frobenius-norm distance between the hybrid precoder and the optimal fully-digital precoder.By exploiting the multipath sparsity of the millimeter-wave (mm-Wave) channel,they propose to calculate the analog precoders via the orthogonal matching pursuit(OMP)algorithm,which selects each column of analog precoder from the array response vectors of channel.The OMP algorithm was shown to be far from optimum in some scenarios [8].In other instances,a manifold optimization based Alt-Min (MO-AltMin) algorithm was proposed in [6]to minimize the Euclidean distance in [4]as well.In each iteration,the update of the analog precoder involves a conjugate gradient computation and retraction to the manifold;thus,its computational complexity is rather high.The authors of [7]propose to maximize the channel mutual information with respect to the RF precoding matrix FRFand the baseband preocoding matrix FBBin a two-step procedure.They also show that the hybrid precoder can achieve exactly the same spectral efficiency of a fully digital one if the number of RF chains is twice the number of data streams.Paper[8]aims at minimizing the mean squared error of the transmitted data from the low dimensional received signal at the receiver,and the authors develop an alternating minimization of approximation gap(Alt-MaG)framework for hybrid beamforming optimization.A computationally simple method named the minimal gap iterative quantization(MaGiQ)is proposed to approximate the fully digital precoder with the analog one directly[8].But the MaGiQ method can only apply to the case where the number of RF chains is equal to that of the data streams.

Besides the fully-connected network,partiallyconnected RF networks has also been proposed to further reduce hardware cost,wherein each RF chain is connected to only a subset of the antennas.For instance,the authors of [6]propose a semi-definite relaxation-based AltMin (SDR-AltMin) algorithm to minimize the Euclidean distance between the hybrid precoder and the fully digital one.It shows that the analog precoder has a closed-form solution for each non-zero elements in FRF.For the digital precoder,they treat it as a semi-definite relaxation problem,which can be solved by a standard convex optimization algorithm but with excessively high computational complexity.Similar to[6],the authors of[17]propose an alternating HPC method to jointly optimize the analog and digital precoder by minimizing the Euclidean distance between the hybrid precoder and the fully digital one,but they consider two cases: i) the analog precoder can adjust both the amplitudes and the phases;ii)it can adjust the phases only.Moreover,an iterative hybrid precoding algorithm was proposed in [18]by using successive interference cancellation(SIC)-based algorithm to decompose the total achievable rate optimization problem into a train of subrate optimization problems,each of which is associated with only a subset of the antennas.But the digital precoder is restricted to be diagonal,leading to suboptimal performance.By reformulating the problem of spectral efficiency maximization under per antenna power constraint into multiple subproblems of singlestream optimal precoding,the authors of[19]propose two analog precoding designs for the low and high signal-to-noise(SNR)conditions,respectively,but for the high SNR the algorithm only allows for the number of the data streams be equal to that of the RF chains.

The RF switch network has been studied in [27,8]for a different category of HPC designs.In [27]the authors propose a switch network-based HPC to trade moderate performance loss for reduced power consumption.The authors of[8]consider the analog network that combines phase shifters and switches and propose a simple greedy method,called GRTM,that has performance similar to the MO-AltMin method in[6].

1.2 Contributions

In this paper,we propose a universal hybrid precoding(UHP) scheme to maximize the spectral efficiency of a massive MIMO system.The UHP is applicable to various types of analog network (thus,the name universal): the analog network can be of fully-connected and partially-connected; the analog network components can be phase shifters(with finite or infinite resolution),switches,or their combinations.Moreover,the UHP scheme doesnotmake the restrictive assumptions that the number of RF chains be equal to the number of data streams.Given the analog precoder,the optimum digital precoding is based on a modified“water-filling” method.We also propose a simplified UHP (sUHP) to trade some minor performance loss for much lower computational complexity.Simulation results verify the effectiveness of the proposed UHP scheme for different forms of the analog RF networks.The merits of our work are in two-fold.First,it is universal –it can deal with different types of analog network,both fully-connected and partially-connected with components being phase shifters (with finite or infinite resolution),switches,or their combinations;it also allows for the number of the data stream be less than the number of RF chains.Second,it is computationally very efficient; the analysis shows that the computational complexity of the proposed UHP/sUHP is one to two orders magnitude lower than the state-ofart methods,but with no compromise in performance.

1.3 Organization and Notations

The rest of the paper is organized as follows.Section II presents the system model and formulates the problem.The optimum digital precoder is derived in Section 3.1.In Section 3.2,the analog precoder is optimized to maximize the spectral efficiency using the QR decomposition whenNRF=Ns.In Section 3.2.1,we propose the UHP scheme for the fullyconnected RF network.It is then modified to apply to the partially-connected RF networks in Section 3.2.2.Then,we propose an asymptotically tight lower bound on the spectral efficiency when the number of the RF chains is larger than the data streams in Section 3.5.The universal hybrid precoding scheme is concluded in Section 3.3,followed by a simplified algorithm.Numerical results are given in Section IV and the concluding remarks are given in Section V.

This paper adopts the following notations: A is a vector,ais a scalar,A is a matrix,Ais a set.|A|,∥A∥and∥A∥Fare the determinant,the norm and the Frobenius norm of matrix A,respectively.AT,AH,A−1are its transpose,conjugate transpose and inverse,respectively.A(:,n) stands for thenth column of A and A(:,1 :n)is the submatrix consisting of the firstncolumns of A.INis anN × Nidentity matrix.Tr(A) is the trace of A.E[·]denotes the expectation andCN(m,R)is a circularly symmetric complex Gaussian random vector with mean m and covariance R.CM×Ndenotes the set ofM-by-Ncomplex matrices.PA≜A(AHA)−1AHis the orthogonal projection matrix associated with A;P⊥A≜I−PA.eiis a unit vector whoseithentry is 1 and other entries are 0,and 0nstands for an-dimensional zero vector.∠(a)stands for the phase ofa.

II.SYSTEM MODEL AND PROBLEM FORMULATION

2.1 System Model

The idea of HPC is illustrated in Figure 1,where theNs-dimensional signal s∈CNsis first transformed by the digital baseband precoder FBB∈CNRF×Nsto FBBs∈CNRF,which in turn is converted by the RF chains intoNRFbranches of RF analog signals.These RF signals are then transformed by the RF network FRF∈CMt×NRFbefore being transmitted over theMtantennas,i.e.,the transmitted signal can be written as

The RF network can be categorized as fullyconnected or partially connected,as shown in Figure 2.For the fully-connected RF network,each RF chain is connected to all the antennas via the phase shifters,the switches,or the phase shifter-plusswitches –as illustrated in Figure 2 (a),2 (b) and 2(c),respectively.The fully-connected RF network involvesMtNRFphase shifters/switches andMt NRFto-1 combiners on the antenna side.

The partially-connected RF network,as illustrated in Figure 2 (d),has two features: i) each antenna is connected with only one RF chain,and ii) each RF chain is connected withantennas.Hence,it uses onlyMtphase-shifters and no combiners on the antenna side–resulting in much lower cost of the hardware.

This paper assumes a frequency-flat channel

where H∈CMr×Mtis the channel matrix,z∈CMris the circularly symmetric complex Gaussian noise,i.e.,z∼CN(0,σ2zIMr).Without loss of generality,we assume thatσ2z=1.The input signal-to-noise ratio(SNR)is therefore

Here we have assumed without loss of generality that the signal covariance E[ssH]=INs(if E[ssH]≠I,R1/2scan be absorbed into FBBto make the assumption valid).

The spectral efficiency of the MIMO channel(2)is[28]

2.2 Problem Formulation

This paper focuses on developing a computationally efficient algorithm to maximize the spectral efficiency with respect to the analog and digital precoders:

where the first constraint is the power constraint,whereas in the second constraint setFdepends on the components of the RF network.For a fully-connected network

whereS={ejϕ:ϕ ∈[0,2π]}for a phase shifter network of infinite resolution,S={ej2kπ/2b:k=0,1,...,2b−1}for a phase shifter network ofb-bit resolution,S={0,1}for a switch network(as shown by Figure 2(b)),andS={ejϕ:ϕ ∈[0,2π]}∪{0},andS={ej2kπ/2b:k=0,1,...,2b−1} ∪{0}for a phase shifter-plus-switch network [see Figure 2(c)],depending on whether the phase shifter has∞−orb−bit resolution.

For a partially-connected network,

where Fiis theith column of F,K=Mt/NRF,andScan be either{ejϕ:ϕ ∈[0,2π]}or{ej2kπ/2b:k=0,1,...,2b−1}depending on whether the phase shifters are of infinite or finite-bit resolution.

III.UNIVERSAL HYBRID PRECODING

In this section,we establish the framework of the proposed UHP scheme based on the QR decomposition.Clearly,the non-convex constraint FRF∈Fis what makes the hybrid precoding problem challenging.We first show the easier part – the optimum digital precoder given an analog precoder.

3.1 The Optimum Digital Precoder

Given an analog precoder FRF,(5)degenerates into

which was addressed in [7,Section III.A].To make this paper self-contained,we show the exact solution using the QR decomposition.

Compute the QR decomposition

where URF∈CMt×NRFis a semi-unitary matrix and RRF∈CNRF×NRFis an invertible upper triangular matrix.Inserting (9) into (8) and denoting GRRFFBB∈CNRF×Nsyields

to which the solution is “water-filling” based on the well-known singular value decomposition(SVD)[29].In particular,given the SVD

the maximizing G for(10)can be expressed as[30]

withλibeing thei-th largest singular value of UHRFHHHURF.The Lagrangian multiplierµis chosen such thatγi=ρ.

After obtaining G,we calculate the solution to (8)as

3.2 Optimization of the Analog Precoder With NRF=Ns

Now we proceed to optimize the analog precoder.We first consider the caseNRF=Ns.The more general case ofNRF> Nswill be addressed later in this section.

Inserting(9)and(14)into the objective function of(5)yields

Using(11)and(12),we have

whereλistands for theith largest eigenvalue of the matrix UHRFHHHURF,and{γi}Ni=1are as given in(13).Since the water-filling power allocation yields higher spectral efficiency than the uniform power allocation,we have

That is,we focus on solving

Observe that

where the augmented matrix

The maximization of(20)can be reformulated as

We propose to optimize FRFcolumn-by-column iteratively.Denote fias theith column of FRFand Fias the submatrix obtained by striking fiout of FRF.In what follows,we show how to solve a key subproblem of(23),namely,

We first establish Lemma 1.Note that in the lemma and throughout this paper,P⊥A≜I−A(AHA)−1AHstands for the complementary orthogonal projection matrix defined by A.

Lemma 1.Given a fixedFi,solving(24)amounts to solving

Proof.Consider a permutation of the columns of FRFas follows.We put fiin the last column,preceded by Fi,i.e.,let≜[Fi...fi].Consider its QR decomposition

As the permutation of the columns of FRFdoes not affect the channel capacity,we have

Denote the QR decomposition Ha=QR,where R∈CNRF×NRF.We have

whereRjjis thejth diagonal entry of R.By the property of QR decomposition,the last diagonal entry

where uNRFis the last column of[cf.(26)]and UNRFis the submatrix ofwith uNRFbeing removed.

Clearly,Rjj,j=1,...,NRF−1 are independent of fi.Therefore,with Fibeing fixed,to maximizeamounts to maximizing

where,by the QR decomposition(26),

Inserting(31)into(30)yields

Then(25)follows from(32) by noting thatP⊥HaFi=P⊥HaUNRF,since Fishares the same column space as UNRF.Hence the lemma is proven.

3.2.1 Analog Precoder for Fully-Connected RF Networks

Based on Lemma 1 we propose an iterative algorithm to solve(24),where at theith step,i=1,2,...,NRF,fiis optimized with Fibeing fixed.For a fullyconnected RF network,i.e.,with fi ∈SMt,per (24),we need to solve

We adopt the coordinate descent method to solve(33).Splitting x into

where eiis a unit vector whoseithentry is 1 and all the other entries are 0,and denoting

for notional simplicity,we rewrite the cost function of(33)as

For a phase shifter with infinite resolution,xn=ejθ.Then objective function in(36)can be further simplified as

where

Here ∠(·) stands for the phase of the complex number.Fixing,we can optimizexnfor different set constraintsS.

Lemma 2.The solution

with g(θ)as defined in(37)must be the solution to

where

Proof.Let the denominator and the numerator ofg(θ),α+r1cos(θ−ϕ) andβ+r2cos(θ−ψ),respectively,be thex-andy-coordinates.The trace of(α+r1cos(θ−φ),β+r2cos(θ−ψ)) asθvaries from 0 to 2πforms an ellipsoid centered at (α,β) as shown in Figure 3.Maximizing(37)amounts to finding the slope of the straight line from the origin that is tangent to the lower side of the ellipsoid.From Figure 3,we see that equating=0 would yield two solutions corresponding to the two lines from the origin that are tangent to the ellipsoid,which are

Substituting the point(α+r1cos(θ−ϕ),β+r2cos(θ−ψ))into this tangent line,we obtain

Expanding (43) yields thatθ∈as shown in(40).

Lemma 2 gives the solution for the ideal phase shifters of infinite resolution.For a phase shifter ofb-bit resolution,we need to determine the integerksuch that

and updateθoptas

Then

For a phase shifter-plus-switch network and a phaseshifter-of-b-bit-resolution-plus-switch network,0∈Salso.We further compareg(θopt) versus the ratiowhich is obtained by insertingxn=0 into t he objective function in(36).Hencexncan be set as

For a switch network,

The algorithm for optimizing the fully-connected analog precoder is summarized in Algorithm 1,where Line 1 to Line 11 is for initialization of FRF,the forloop in Line 12 is for optimizing FRFcolumn-wise,the for-loop in Line 15 is for optimizing fielementwise.In Line 8Q(Θ)stands for quantizing Θ elementwise to{,l=0,1,...,2b−1}.We need to recalculate A and B when optimizing each column of FRF,which seems computationally highly complicated.But A and B can actually be calculated recursively using the properties of QR decomposition and Givens rotations to drastically reduce the computational complexity,as is shown in Appendix A.

3.2.2 Analog Precoder for Partially-Connected RF Networks

In parallel to Section 3.2.1,here we focus on optimizing the analog precoder for the partially-connected case as illustrated in Figure 2(d),where one RF chain is connected to a separate set ofK=Mt/NRFantennas.

For a partially-connected RF network,fi=thus,P⊥Fifi=fiand the denominator of(24)fHi P⊥Fifi=fHifi=K.Hence,per(24)we solve

Denote

Note that fihas onlyKnon-zero elements.Denote Ai ∈CK×Kas the block diagonal of A corresponding to the non-zero subvector of fi,which is denoted as x∈SK×1.We can rewrite(49)as

Splitting x into

we have from(51)that

Then for a phase shifter of infinite resolution

for a phase shifter ofb-bit resolution

whereQ(x) stands for quantizingxto the nearest point in2π,l=0,1,...,2b−1.Thus,

Iterating overk=1,2,...,K,we update x to achieve a close-to-optimal solution to(51).

Algorithm 2 describes the algorithm for optimizing the partially-connected analog precoder,i.e.,for solving (23),where Line 1 to Line 11 is for the initialization of FRFbased on the phases of the principal singular-vectors of Ai’s,the for-loop in Line 12 is for optimizing FRFcolumn-wise,and the forloop in Line 13 is for optimizing fielement-wise.In Line 8Q(θ) stands for quantizingθelement-wise to

3.3 Universal Hybrid Precoding Scheme

Now we have established the algorithms of optimizing the analog precoders for both the fully-connected and the partially-connected RF networks.We illustrate the UHP scheme by the flow chart in Figure 4.Depending on whether Algorithm 1 or Algorithm 2 is employed,we refer to the UHP scheme as the UHP-FC and UHPPC,respectively.A caveat is that the initialization of FRFis only conducted when running Algorithm 1 or 2 for the first time;when we follow the dashed line with arrow,referred to asthe outer-loop,to rerun Algorithm 1 and 2,FRFwill be updated based on the output from the previous iteration.

3.4 Simplified UHP Scheme

If we remove the dash line from Figure 4,then Algorithm 1 or Algorithm 2 is run only once.In doing so,the computational complexity is significantly reduced at the cost of a minor performance loss,as we will see soon.This simplified UHP scheme is referred to as sUHP scheme.

Algorithm 2.Optimization of FRF for the partiallyconnected RF network.1: Initialize A=HHa Ha.2: Initialize FRF=zeros(Mt,NRF).3: for i=1:NRF do 4: Ai=A(K(i−1)+(1 : K),K(i−1)+(1 :K)).5: Compute u as the principal singular-vector of Ai.6: Take the phases element-wise θ=∠(u).7: if The phase shifters have b-bit resolution then 8: Quantize θ ←Q(θ).9: end if 10: Initialize FRF(K(n−1) + (1 : K),i)=exp(jθ).11: end for 12: for i=1:NRF do 13: for k=1:K do 14: x=FRF(K(n−1)+(1:K),n).15: Fix ¯xk and update xk by(57).16: end for 17: Assign FRF(K(i−1)+(1:K),i)=x.18: Update A according to(50).19: end for 20: return FRF.

3.5 Optimization of the Analog Precoder in the Case Of NRF >Ns

For the case ofNRF> Ns,the digital precoder needs to apply power allocation to aNs-dimensional subspace of theNRF-dimensional range space of FRF.Thus,GGHas appeared in(15)will be rank-deficient and(18)will fail to hold.As stated in[7,Sec.IV.D],“due to the difficulties of optimizing over a function of subset of eigenvalues of a matrix”,the authors of [7]could only approximate(λ) defnied in (19) “with an expression including all of the eigenvalues,i.e.,

We can overcome the difficulties and directly maximize(λ)owing to the following lemma.

Lemma 3.Suppose Mr ≥Ns.

Andthat consists of the Ns largest singular vectors ofHURFUHRFHH maximizes the objective function.

Proof.Since=INs,note that

According to Horn’s Theorem[32,H.3]

Given the SVD

Clearly,the equality in(60)holds ifis the firstNscolumns of U.Taking the logarithm of both sides of(60),we have

Based on Lemma 3,to maximize(λ) amount to solving

We propose to optimize FRFandin alternatingly.The solution toin(63)is already shown in Lemma 3.Hence,we focus on solving

Observe that

where the augmented matrix Hbis defined as

The maximization in (64) can hence be reformulated as

which has exactly the same form as problem (23).Hence the Algorithm 1 and Algorithm 2 can be immediately applied to the case ofNRF> Nsbut with Habeing replaced by Hb.For the UHP scheme,we optimizeand FRFin an alternating manner via the outer-loop iterations,which corresponds to the double maximization problem(63)for the case ofNRF>Ns.And for the sUHP scheme,it makes no alternating optimization betweenand FRF.

3.6 On the Computational Complexity

In Appendix B,we analyze the computational complexity in complex-valued multiplications for both UHP-FC and UHP-PC algorithms as shown in Table 1,whereLis the number ofthe outer-loopiterations andL=1 for sUHP.In contrast,the computational complexity of a state-of-the-art algorithm in[7]isLNRF(M3t+2(NRF−1)M2t),which is much higher than the numbers in Table 1 for a largeMt.

Table 1.Computational complexity.

IV.SIMULATION RESULTS

We present simulation results to verify the superior performance of the proposed schemes.All the simulations are based on the narrow-band mmWave clustered channel model[33]

where the multipath gainsαl ∼CN(0,1),at(θl)and ar(ϕl)are the antenna array responses at the transmitter and receiver with respect to the angle of departureθland the angle of arrivalϕl,respectively.The transmitter and the receiver have array responses

respectively,whereλis the signal wavelength,d=is the inter-antenna spacing.It is further assumed that there areµ=15 multipaths with angles randomly distributed between 0◦and 360◦.

4.1 Performance of UHP Under Different RF Constraints

We first simulate the UHP scheme under different RF constraints including the phase-shifter (PS) network for both fully-,and partially-connected topologies with infinite resolution (−⋄−),1-bit resolution(−◦−),2-bit resolution(−◁−),the switch(−∗−)and phase shifter-plus-switch (−−) fully-connected network in Figure 5.We consider a 64×16 MIMO system (Mt=64,Mr=16) withNRF=Ns=4.The fact that the dashed line almost overlaps with the ideal fully-connected PS network suggests that the architecture shown in Figure 2 (c) has little advantage over that in Figure 2 (b).Moreover,2-bit resolution fully-connected PS network(i.e.,withS={±1,±j})incurs a less than 1dB loss compared with the infiniteresolution fully-connected PS network.The switch network(−∗−)also performs quite well: it is about 3.5dB away from the PS network with infinite resolution.Furthermore,the UHP-PC scheme applied to the PS network of infinite resolution incurs a loss of about 4dB performance compared with infinite-resolution fully-connected PS network in exchange for a greatly reduced hardware complexity.Employing low-bit resolution partially-connected PS networks introduces an additional 0.5∼1.5 dB performance loss.

4.2 Performance of UHP Compared with Other Algorithms

Besides the UHP schemes,we also simulate the stateof-art hybrid precoding methods,including the MOAltMin in [6](−◁−),the algorithm in [7](−⋆−)and the SDR-AltMin in [6](−▷−).ForNRF=6 andNs=4,Figure 6 shows that the UHP-PC scheme significantly outperforms the SDR-AltMin algorithm in the case of partially-connected PS network.For the fully-connected case,the gain of the proposed UHPFC scheme over the state-of-the-art methods,albeit small,verifies the benefit of tackling theNRF> Nsscenario through the double maximization(63)based on Lemma 3,rather than using an approximation as done in[7].

We further compares the UHP-FC and the sUHP-FC with the MO-AltMin in [6]and the algorithm in [7]under differentNRF,and the UHP-PC,the sUHP-PC with the SDR-AltMin in[6]are also simulated in Figure 7.In[7],it is shown that whenNRF≥2Ns=8,there is a closed-form solution for designing FRFand FBBto achieve performance the same as the optimal fully-digital BF.Our proposed UHP-FC scheme can also achieve so.It is also observed that the UHPPC algorithm has a significant performance gain over SDR-AltMin.And the performance loss of the sUHP compared with the UHP is negligible.

GivenNRF=Ns=4,SNR=−10 dB,Figure 8 shows the convergence speed of the UHP scheme in terms of log||UHRFHHaHaURF||,where one iteration corresponds to oneout-loopthrough the dashed line with arrow as shown in Figure 4.It is seen that both the UHP-FC and the UHP-PC converge within 5 iterations.

Now we compare the computational complexity of our UHP scheme as shown in Table 1 with that of the state-of-art algorithm in [7],which isLNRF(M3t+2(NRF−1)M2t).The computational complexity of the MO-AltMin in[6]is even higher due to the nested loops by using line-search and the kronecker products.We setMr=16,NRF=4,the number of outer-loop iterationsL=5 for UHP (L=1 for sUHP).Figure 9 shows the computational complexity of the different algorithm under differentMt.It shows that the complexity of UHP-FC/sUHP-FC is one to two orders of magnitude lower than the Algorithm in [7].This result,together with the performance shown in Figure 6 and 7,verifies the superiority of the proposed UHP algorithm in both spectral efficiency and computational complexity.

For the fully-connected phase shifter network,we use the column-wised optimization and coordinate descent method,which is similar to [7].But our proposed algorithms differs in two aspects:

First,by artfully using the QR decomposition,we formulate the problem (33) for each column of FRF,where the orthogonal projection matricesandcan be calculated recursively as is shown in Appendix A.In contrast,the algorithm in[7]requires the inversion of an(NRF−1)×(NRF−1)matrix and several matrix multiplications for each column,which leads to a much higher computational complexity,as shown in Figure 9.

Second,for the case ofNRF> Ns,we establish Lemma 3,based on which we can solve the difficult problem of optimizing the function of a subset of eigenvalues of a matrix through the double maximization problem(63).In contrast,the algorithm in[7]can only solve an approximation to the original problem.This explains the superior performance of UHP-FC as shown in Figure 6.

V.CONCLUSION

In this paper,we study hybrid precoding for massive MIMO using different types of radio frequency networks,such as the phase-shifter network,the switch network,and the phase-shifter-plus-switch network for both fully-connected and partially-connected topologies.We propose a universal hybrid precoding (UHP) scheme to maximize the system spectral efficiency for both fully-and partially-connected networks,called the UHP-FC and the UHP-PC,respectively.These two algorithms optimize the analog precoder inNRFsteps,whereNRFis the number of the RF chains.Given the analog precoder,the digital precoder is obtained using a modified water-filling method.A simplified UHP (sUHP) scheme is also proposed to reduce the computational complexity with minor performance loss.Simulation results verify the superiority performance of the proposed schemes over the state-of-the-art methods in both spectral efficiency and computational complexity.

ACKNOWLEDGEMENT

The work in this paper of Y.Feng and Y.Jiang was supported by National Natural Science Foundation of China Grant No.61771005,and that of M.K.Varanasi in part by Qualcomm Gift 24868 and the 2017 and 2018 Qualcomm Faculty Awards.Partial material of this paper has been published in International Conference on Wireless Communications and Signal Processing(WCSP)2018 and 2019.

APPENDIX A: RECURSIVE UPDATES OF A AND B DEFINED IN(35)

We first recall the following lemma.

Lemma 4.GivenF∈CM×N (M ≥N),and its QR decomposition[34]F=QR,whereQ∈CM×N andR∈CN×N,then the projection matrix

and the complementary projection matrix

Proof.Based on the QR decomposition,we have that

And then(71)follows from(72)immediately.

Recall from (35) that A ≜P⊥FiHHa P⊥HaFiHaP⊥Fiand B ≜P⊥Fi,where

which is related to Fi−1=[fi,...,fNRF,f1,...,fi−2]via striking out the first column fiand appending fi−1to the end.The key to the computationally efficient calculation of A and B is to recursively updateP⊥FiandP⊥HaFiafter their initialization.

Initialization Of A And B

From(73),F1=[f2,...,fNRF].Denote the QR decompositions

According to Lemma 4,it is easy to see that

Denote an intermediate variable

then it follows from(35)that

Recursive Update Of A And B

Denote the QR decompositions

The update involves two steps.The first step is to strike the first column fiout from Fi−1.Now we have from(78a)that

Hence we can use(NRF−2)Givens rotations to zero out the boxed entries of ¯Ti−1in(79)to obtain

Following the same procedure in the above from(79)to(81),we can have from(78b)

Hence,it follows from Lemma 4 that

or

where

Similarly,

where

Combining(83)and(76),we see that corresponding to the operation of striking out the first column from Fi,C can be updated to be

Using(85),we can then update A by

before updating

The second step is to append fi−1to the last column ofto obtainProceeding from(81),we have the QR decomposition

where

Similarly,proceeding from (82),we have the QR decomposition

where

Then according to Lemma 4,

or

Combining (83) and (76),we see that corresponding to the operation of appending fito,C can be updated as

and A can then be updated as

before updating

In summary,the matrices A and B can be updated recursively according to (88)-(89) and (97)-(98) to greatly reduce the computational complexity.

APPENDIX B: ANALYSIS OF COMPUTATIONAL COMPLEXITY

In this appendix,we analyze the computational complexity of our proposed algorithms in terms of the number of complex multiplications.We first show the computational complexity for UHP-FC whenNs=NRF.The main complexity includes the following parts:

1.Calculating A and B in(75)and(77)at the first iteration:

Step 1: to calculate HaF1which takes

Step 2: to calculate HHaHawhich takes

Step 3: to calculate the QR decompositions in(74)which takes

Step 4: to calculate B and C in (75) and (76)which takes

Step 5: to calculate A in(77)which takes

The computational complexity for calculating A and B in(75)and(77)is

2.Updating A and B in(88)and(97):

Step 8: to update A and B in(88)which takes

Step 9:to calculate Pi,Ti,Qiand Riin(90)and(90)which takes

Step 10: to update A and B in(97)which takes

thus,the computational complexity for updating

A and B in(88)and(97)is

3.Calculating x in(38):

It can also update recursively as follows:

Step 12: updating (38) forx2usingand.we denote

then

thus,the computational complexity for updating

In summary,the computation complexity for the UHP-FC algorithm is

The computational complexity of the UHP-PC algorithm can be similarly analyzed,of which the details are omitted.The total numbers of complex multiplications of these algorithms are listed in Table 1 in Section 4.2.