Latent Variable Regression for Supervised Modeling and Monitoring

2020-05-21 05:44QinqinZhu
IEEE/CAA Journal of Automatica Sinica 2020年3期

Qinqin Zhu

Abstract—A latent variable regression algorithm with a regularization term (rLVR) is proposed in this paper to extract latent relations between process data X and quality data Y. In rLVR,the prediction error between X and Y is minimized, which is proved to be equivalent to maximizing the projection of quality variables in the latent space. The geometric properties and model relations of rLVR are analyzed, and the geometric and theoretical relations among rLVR, partial least squares, and canonical correlation analysis are also presented. The rLVR-based monitoring framework is developed to monitor process-relevant and quality-relevant variations simultaneously. The prediction and monitoring effectiveness of rLVR algorithm is demonstrated through both numerical simulations and the Tennessee Eastman(TE) process.

I. Introduction

IN industrial processes, timely process monitoring is of great importance, which helps detect potential hazards and enhance operation safety in the processes, thus contributing substantially to tomorrow’s industry and imparting significant economic benefits. Traditionally, the routine examinations by experienced personnel were the major approach to detect anomalies, which, however, is prone to error and not completely reliable and comprehensive. With the advancement of technologies in data collection, transmission and storage, a new effective monitoring scheme based on multivariate analytical methods has emerged to track variations in the process in a timely and reliable fashion, and it is widely applied in chemical engineering, biology, pharmaceutical engineering, and management science [1]–[6]. Among them, principal component analysis (PCA), partial least squares (PLS) and canonical correlation analysis (CCA) are three popular and effective algorithms used in multivariate process monitoring.

PCA is a powerful tool to discover important patterns and reduce the dimension of process data, and it decomposes the original process space into the principal component subspace(PCS) with large variances and the residual subspace (RS)which mainly contains noise [7]. The monitoring scheme based on PCA is well defined to detect anomalies in PCS withT2statistic and those in RS withQstatistic [1], [8]. In industrial processes, product quality is of major concern.PCA-based monitoring scheme, however, fails to build the connection betweenXand quality variablesY, and the information ofYis not available in both modeling and monitoring stages, which makes it hard to identify whether the faulty samples will affect product quality. Thus, supervised algorithms such as PLS and CCA are preferred.

PLS extracts the latent variables by maximizing the covariance betweenXandY, thus quality information is successfully captured in the latent model. Since PLS pays attention to both process and quality variables, the captured latent variables contain variations that are orthogonal or irrelevant toY, and further decomposition is necessary for comprehensive modeling of PLS [9]–[11]. Another issue involved in PLS is that its objectives of outer and inner modeling are different: the outer model is to maximize covariance betweenXandY, while inner model tries to minimize regression errors between process and quality scores.The discrepancy reduces the prediction efficiency of PLS.

Similar to PLS, CCA works as a supervised modeling algorithm to construct its latent space with the supervision of quality variables. The latent variables are extracted by maximizing the correlation betweenXandY, and all the information contained in the latent variables is relevant toY,thus CCA can get rid of the effect of process variances and retain a better prediction performance than PLS. The remaining information left in the process and quality variables may still be valuable to reveal abnormal variations in the data,which can contribute to the improvement of operation safety and economic efficiency; however, these variations remain unexploited in CCA, and concurrent CCA was proposed to conduct a full decomposition onXandY[12].

T2andQstatistics are employed to monitor variations in PCS and RS subspaces for PLS and CCA as well, which obtain satisfactory performance [8], [13]–[15]. Distributed process monitoring frameworks are also developed for plantwide processes [16]–[18].

In this paper, we propose a new regularized latent variable regression (rLVR) algorithm to build the relation between process and quality data, which is designed to have consistent outer and inner modeling objectives. Different from PLS and CCA, rLVR pays attention to both correlation betweenXandYand the variance ofY, which achieves better prediction power.The geometric properties and model relations of rLVR are analyzed, which reveals the orthogonality of extracted latent variables. An rLVR-based monitoring framework is also developed to monitor process-relevant and quality-relevant variations.

The remainder of this paper is organized as follows. In Section II, the traditional latent variable methods, PLS and CCA, are reviewed. Section III presents the motivation and details of regularized LVR algorithm. The geometric properties and model relations of rLVR and its equivalence with PLS and CCA are demonstrated in Section IV. The comprehensive monitoring scheme based on rLVR is developed in Section V. Section VI employs two case studies,a numerical simulation and the Tennessee Eastman process, to illustrate the effectiveness of rLVR over PLS and CCA from prediction and monitoring perspectives. The conclusions are drawn in the last section.

II. Latent Structured Methods

A. Projection to Latent Structures

Projection to latent structures (PLS), which is also referred to as partial least squares, is a typical supervised dimensionality reduction algorithm. It constructs a lower-dimensional space by maximizing the covariance between the projections of process dataX∈Rn×mand quality dataY∈Rn×pin the latent space,wherenis number of training samples, andmandpare number of process variables and quality variables respectively. The objective of PLS is mathematically presented as

where score vectorst∈Rnandu∈Rnare projections ofXandYon the latent space respectively, and weighting vectorsw∈Rmandq∈Rpare projecting directions. Equation (1) is also called the outer modeling objective of PLS, and its solution can be derived iteratively with the aid of Lagrange multipliers.

In inner modeling, PLS builds a linear regression model betweentandu,

wherebis regression coefficient, and ϵ is modeling error. The first part in the above equation denotes the prediction of input score, and the regression coefficientbcan be obtained by minimizing the modeling error betweentandtˆ

B. Canonical Correlation Analysis

Canonical correlation analysis (CCA) [21], also known as canonical variate analysis (CVA), works as a counterpart of PLS to extract latent structures by maximizing the correlation betweenXandY, whose objective is formulated as

or equivalently,

In contrast to the iterative way in PLS, the latent variables of CCA can be derived directly by singular value decomposition (SVD). The detailed procedure of CCA is summarized as

1) Pre-processXandYto make them zero-mean and unitvariance;

2) Perform SVD on scaledXandY,W=[w1,w2,...,wl]andQ=[q1,q2,...,ql],

3) Perform SVD to calculate the weighting matrices

The deflation of CCA can be obtained similar to PLS by minimizing | |X−TPT|| a nd | |Y−TCT||, leading to

whereT=XW, andPandCare loading matrices for process and quality variables in CCA.

It is known that CCA is sensitive to noise in the presense of strong collinearity. Thus, regularized CCA (rCCA) was proposed to address the ill-conditioned performance by designing two regularization terms in both process and quality sides [12].

C. Several Notes for PLS and CCA

In (1), PLS pays attention to both correlation betweenXandYand their variances, and the extracted latent variables contain irrelevant or orthogonal information, which makes no contributions to predictingY. As a consequence, PLS needs superfluous number of latent variables. For instance, multiple latent variables are required to predict only one-dimensional quality dataY. Thus, further decomposition is designed in the subsequent works, such as total PLS [9] and concurrent PLS[10]. Another issue involved with PLS is the inconsistent objectives for outer and inner modeling, as observed in (1) and(2).

CCA achieves better prediction or modeling performance by focusing on correlation between process and quality variables only. However, CCA attaches equal importance to process and quality variables, and the modeling performance can be further improved by incorporating the variances and signals of quality variablesY.

Therefore, motivated by the aforementioned analysis, we propose the latent variable regression (LVR) method in the next section [22].

III. Latent Variable Regression

A. Outer Model

In LVR, in order to make inner and outer modeling consistent, the following outer relation is designed.

where the symbols have the meaning as in (1). It is noted that the constraint in (5) is different from those for PLS, which is designed on purpose and will be explained in the following subsections.

The solution of (5) can be obtained with Lagrange multiplier λqas follows:

Taking derivative with respect towandqand setting the results to zero result in

By re-arranging the above equations, we have

wherewTandqTare pre-multiplied in (6) and (7), which leads to

Lemma 1:The least squares objective for latent variable regression in (5) is equivalent to minimizing the Lagrange multiplier λq.

Proof:In (5),Joutercan be expanded as

Thus, minimizing the prediction error between projection scorestanduin LVR is equivalent to finding the minimum

B. Inner Model

Equations (8) and (9) constitute the outer modeling of LVR.For inner modeling, the same objective is applied; that is to minimize the least squares,

which leads to

Remark 1:The inner modeling is not needed in latent variable regression method.

It is noted that Remark 1 is an expected result due to the same outer and inner objectives in (5) and (10).

C. Deflation of XandY

Deflation ofXandYis performed to remove the effects of extracted latent variables, which can be represented as

wherep∈Rmandc∈Rpare loading vectors forXandYrespectively, and they can be calculated by minimizing the regression errors | |X−t pT||2and | |Y−tcT||2, leading to

Therefore, the procedure to extract latent variables in LVR is summarized as follows.

1) Scale process and quality dataXandYto zero mean and unit variance.

2) Initializeuas the first column ofY, and repeat the following relations until convergence is achieved.

3) DeflateXandYby

4) Conduct Steps 2 and 3 for next round untilllatent variables are extracted, wherelis determined by cross validation.

D. Regularized Latent Variable Regression

In LVR, inversion of the covariance ofXis involved to calculate weighting vectorw, and the collinearity inXwill lead to inconsistent results, which is shown in Fig. 1. In Fig. 1(a), two columns ofXare tightly correlated, and since the angle betweenx1andx2is not zero, Plane 1 is formed. In this case, quality variableyis able to project onto Plane 1, and the projection isy′.However, in most cases, data is subject to noise, which will makex1deviate from its original direction, andis the resulting direction.In Fig. 1(b), the new plane, Plane 2,defined byandx2drastically diverges from Plane 1,and the new projection y∗ in Plane 2 is also different fromy′. As concluded from Fig. 1, when the data are strongly collinear, the results of LVR are not reliable and consistent. Thus, it is necessary to address the collinear issues, which can be achieved by constraining the norm ofwwith a regularization term.

Lemma 2:The LVR objective in (5) is equivalent to the following objective [23]:

Fig. 1. Ill-conditioned performance caused by collinearity. (a) Projection of y ; b ) Large deviation of projection of y caused by noise.

The proof details are given in Appendix A. The new objective in (13) for LVR is similar to the formulation for PLS in (1) or for CCA in (4); however, due to the different constraints, the derived solutions and their geometric properties are different. Since the objective in (13) is more conventional in process monitoring, it is adopted to develop the regularized LVR (rLVR) algorithm.

A regularization term is designed in rLVR as follows to address the strong collinearity [22].

where γ is regularized parameter.

The detailed rLVR algorithm is summarized in Algorithm 1.There are two parameters involved in Algorithm 1, which areland κ, and they can be determined jointly with a cross validation.

Algorithm 1 Regularized Latent Variable Regression

IV. Geometric Properties and Model Relations

A. rLVR Geometric Properties

PCA, PLS and CCA extract their latent variables by maximizing a statistic metric (variance, covariance or correlation), and their geometric properties are well studied[2], [13], [24]. The idea of rLVR algorithm is different from PCA, PLS and CCA, and it is important to understand the structures of its latent space for further applications.

For ease of illustration, a subscriptiis used to denote the iteration order. For instance,tide notes theith latent score,XiandYiare deflated process and quality datasets inith extraction round, and we have

Then, with this denotation and the relations in Algorithm 1,we have the following lemma.

Lemma 2:We have the following orthogonal properties between residuals and model parameters in regularized LVR algorithm:

The proof of Lemma 2 is given in Appendix B. With the orthogonal relations in Lemma 2, it is straightforward to derive the orthogonality among model scores, weights and loadings, which is summarized in the following theorem.

Theorem 1:The following orthogonal geometric properties hold for regularized LVR.

Proof:To prove Relations 1 and 2, thepexpression in (15)is utilized

For Relation 3, assuming thati>j, then

wheni

Additionally, from the Lagrange relations of (14), we have

where λwand λqare Lagrange multipliers forwiandqirespectively. Thus, we have

wherecis normalization coefficient. With the relations in Lemma 2 and assumingi>j

Simil arly, whenTherefore, Relation 4 is proved.

Theorem 1 shows that the scorestiare mutually orthogonal,and the deflation of process and quality datasets can be represented as

Equation (20) implies that in order to calculatewandqin(6) and (7), only one dataset needs to be deflated for further iterations.

B. rLVR Model Relations

After performing rLVR, the process and quality data can be predicted by

whereandQ=[q1,q2,...,ql]. BothPandQare available from training stage, whileTvaries with process dataX. Thus, it is necessary to derive the explicit relation betweenXandT.

According to (39) in Appendix B,Xican be re-arranged into

whereThen each score vectortiinTis calculated by

whereri≡N1:i−1wi. With Relations 1 and 2 in Theorem 1, it is easily to show that

whereR=[r1,r2,...,rl], andW=[w1,w2,...,wl]. Therefore,Tand predictions ofXandYcan be calculated from process data directly.

The following properties hold forRandP.

Lemma 3:PRTandI−PRTare idempotent matrices. That is

The proof of Lemma 3 is provided in Appendix C. Lemma 3 demonstrates that bothPRTandI−PRTare orthogonal projection matrices.

In online prediction, the prediction of quality data is calculated from the new samplexdirectly.

and the new sampleXcan be modeled as

Theorem 2:Regularized latent variable regression algorithm

Proof:From (27), we have

with Lemma 1, the first item in (29) is

Similarly, the second item is

Therefore

C. Relation Among PLS, CCA and LVR

A generalized formulation of PLS, CCA [21] and LVR can be derived as [25]

where 0 ≤αw,αq≤1.

PLS, CCA and LVR are three special cases of (30):

1) When αw=1 and αq=1, (30) reduces to PLS;

2) When αw=0 and αq=0, (30) reduces to CCA, and the constraints of ||w||=1 and ||q|| = 1 are equivalent to adding a regularization term forXandYrespectively;

3) When αw=0 and αq=1, (30) stands for LVR, and the regularization term is incorporated with the extra constraint||w||=1.

Geometrically, for ease of comparison, the objective of PLS, CCA and LVR are re-arranged as

where θ is the angle betweenuandt, and for simplicity, the regularization term is omitted in the discussion for LVR. The geometric relations among PLS, CCA and LVR are presented in Fig. 2.

Fig. 2. The geometric relations among PLS, CCA and LVR.

As discussed in Section II, in addition to the relation between scoresuandt, PLS also emphasizes the variances ofXandYas shown in Fig. 2, and the extracted latent variables obtains less effective prediction power.

In contrast, CCA is to maximize the correlation between the projections ofXandYin the latent space, thus it only focuses on the angle betweenuandt. CCA works well for prediction;however, since the variances of process and quality spaces are not exploited, further decompositions are necessary for good monitoring performance [12].

The proposed LVR algorithm maximizes the projections of quality scores on the latent space, and both variance ofYand the angle betweenuandtare considered, leading to a better prediction effectiveness.

V. Process Monitoring With rLVR

It is important to develop a monitoring system based on the extracted latent variables in rLVR to detect anomalies in both principal component subspace (PCS) and residual subspace(RS).

The variations in PCS are relevant to quality variables, and contain large variances. Assuming that they are normally distributed,T2index can be utilized to monitor qualityrelevant variations in this subspace [1]. For a new samplex,itsT2is calculated by

wheret=RT x, and Λ=1/(n−1)TTTis a diagonal matrix.The threshold forT2can be defined as

whereFl,n−l,αdenotes anF-distribution withlandn−ldegrees of freedom, and α defines the confidence interval.

The information contained in the residual space is not related to quality variables, but it is still beneficial to monitor the variations in RS for the sake of operation efficiency and safety. It is not appropriate to useQstatistic directly as in PCA [1], [2], since the variances in the RS space may still be very large. Thus, a subsequent PCA decomposition is applied in the residualto extract the latent structure in RS space.

wher eandcan be monitored withT2andQindices respectively.

The detailed monitoring statistics and their corresponding thresholds are summarized in Table I, and the monitoring scheme is as follows.

1)IfT2exceeds its control limit, a quality-relevant fault isdetected with ( 1−α)×100% confidence.

TABLE I Monitoring Statistics and Control Limits

2) Ifis larger than its control limit, a process-relevant fault is detected with (1−α)×100% confidence, and the fault will not affect quality variables.

3) IfQrexceeds its threshold, the relation between process and quality variables might be broken, which needs further investigation.

Remark 2:When the quality measurements are available, a similar decomposition can be further applied to the residual of quality variablesand the corresponding monitoring scheme can be developed, which is referred to as concurrent monitoring as developed for PLS [10] and CCA [12].

VI. Case Studies

A. Case Studies on Simulation Data

To verify the effectiveness and robustness of LVR, we generate two scenarios in this section, and collinearity is introduced in both scenarios. rLVR, regularized CCA (rCCA)[12] and PLS are performed on the first scenario, and their performance is compared in terms of correlation coefficient,and proportion of variance explained of process and quality variables. In Scenario II, different noise levels are designed to show the robustness of rLVR.

1) Scenario 1:The following expressions are used to generate data for the first scenario.

where

wheree∈R5∼N(0,0.22),v∈R4∼N(0,0.82), andt∈R5∼N(0,(3×i)2),i∈[1,2,...,5]. It is noted that strong collinearity is introduced in bothXandY, where the 2nd and 4th columns and 3rd and 5th columns ofA, and the 2nd and 4th rows ofCare highly dependent.

800 samples are generated with (37), and the first 600 ones are used as training data while the remaining ones are for test data. The model parameters are selected through cross validation: for rLVR,l=3, and κ=0.001; for rCCA,l=3,κx=0.001, and κy=0.059; and for PLS,l=5.

The correlation coefficientr, and proportion of variance explained of process variables (PVEx) and quality variables(PVEy) of these models are shown in Figs. 3–5. As presented in Fig. 3, since rLVR pays attention to both correlation betweenXandYand variance ofY, its correlation coefficient and PVEyare highest for the first latent component, leaving less information in the residuals. rCCA focuses on maximizing the correlation between process and quality data. Thus, itsrfor each latent variable is relatively high; however, its ability to exploit process and quality variances is weak, which requires further processing. PLS in Fig. 5 tries to incorporate all three factors (correlation, process variance and quality variance), but the regression relation betweenXandYin PLS model is weakest among the three models, and it requires five principal components to achieve good performance.

Fig. 3. Correlation coefficient and proportion of variance explained for rLVR in Scenario 1.

Fig. 4. Correlation coefficient and proportion of variance explained for rCCA in Scenario 1.

Fig. 5. Correlation coefficient and proportion of variance explained for PLS in Scenario 1.

2) Scenario 2:The same formulation in (37) is adopted in Scenario II, and the data are generated with different levels of noisevas follows:

Through cross validation, the model parameters are selected asl=3 and κ=0.001 for rLVR andl=3 for LVR in both cases. To compare the robustness of the models, the following metric is defined [12], which denotes the angle between weighting vectorsrifor different magnitudes of noise.

wherei∈[1,2,3]. The results of LVR and rLVR are summarized in Table II. As observed from the table, when the noise level increases, the angles between weighting vectors for rLVR is small, and its constructed latent structure is consistent. However, LVR is sensitive to noise, and the resulting angles diverge, which is illustrated in Fig. 1.

The regularization term in rLVR handles strong collinearcases. However, the value of the regularized parameterκ should not be too large; otherwise, it will have a negative effect on the prediction performance. As shown in Fig. 6, with the increasing values of κ, the mean squared errors (MSEs) of quality variables increase as well, whereYi(i∈{1,2,3,4})denotes theith quality variable. Additionally, an appropriate number of latent variableslis also important for the effectiveness of rLVR: Iflis too small, then the extracted latent variables cannot exploit process and quality spaces fully, leading to a sub-optimal prediction and monitoring performance; On the other hand, iflis too large, the extra latent factors tend to introduce noises into models, which may have negative effects on system modeling. Therefore, it is essential to employ cross-validation to determine the values of κandl.

TABLE II Angles Between r for LVR and rLVR (°)

Fig. 6. MSEs of quality variables with increasing κ.

B. Case Study on the Tennessee Eastman Process

In this section, the Tennessee Eastman (TE) process [26] is utilized to further demonstrate the effectiveness of the proposed algorithm. TE process was created by the Eastman Chemical Company for the purpose of developing and evaluating the techniques proposed in the process systems engineering field.The process involves five main components, which are reactor,condenser, stripper, compressor and separator. The reactions occurring in the reactor are

where reactantsA,C,DandEare gases, and main productsGandHand byproductFare liquids.

Two blocks of data are available in the TE process, which are process measurements XMEAS(1–41) and manipulated variables XMV(1–12). The detailed description of these variables are summarized in [26]. In this case study,XMEAS(1–22) and XMV(1–11) are selected as process variables, and XMEAS(35–36) are chosen as quality variables. Cross validation is applied to choose the model parameters: for rLVR,l=1 and κ=0.013; for CCA,l=1,κx=0.1, and κy=0.001; and for PLS,l=4. In this case study,the regularized LVR and CCA are employed for performance comparison to address the collinearity in the TEP data.

Downs and Vogel [26] simulated 20 disturbances for further analysis, and two typical ones are selected to compare the performance of rLVR, PLS and CCA in our work, which are IDV(1) and IDV(4).

1) IDV(1) – a Step Disturbance on A/C Feed Ratio:While keeping the composition ofBconstant, a step change is introduced onA/Cfeed ratio in IDV(1). Figs. 7–9 show the prediction performance of rLVR, PLS and CCA respectively.It is noted that quality variables XMEAS(35) and XMEAS(36) are denoted asY1andY2respectively in the figures. In PLS, a gap exists consistently for XMEAS(36)even after the process returns to normal with the effect of controllers. In contrast, both rLVR and CCA work well to predict the variations or trends of quality variables, with rLVR slightly better in terms of MSEs as shown in Table III. It is noted that rLVR cannot fully follow the trend of quality variables when the disturbance is introduced, and this is caused by the dynamics in the process, which will be addressed with a dynamic extension of rLVR in a future work.

Fig. 7. Prediction results of rLVR for IDV(1).

Fig. 8. Prediction results of PLS for IDV(1).

Fig. 9. Prediction results of CCA for IDV(1).

TABLE III MSEs of rLVR, CCA and PLS for IDV(1)

The monitoring performance of rLVR and PLS are shown in Figs. 10 and 11, and CCA’s performance results are omitted due to its negligible differences from rLVR in this data. In Figs. 10 and 11,andQrdenote the monitoring indices for principal component subspace, process principal subspace,and process residual subspace, respectively, whileis the quality monitoring index from performing PCA on quality variables directly. As observed from the figures, for both rLVR and PLS,andQrrespond more quickly than quality monitoringwhere black vertical line denotes the timestamp when disturbance is introduced. Aligning with the prediction results, PLS tends to return to normal after the tuning of controllers, but the false alarms are still consistently raised after Sample 200. In contrast, rLVR follows the quality trends better than PLS, and only process relevant faults are detected inandQrwith lower level of importance.Therefore, due to the emphasis of quality information in modeling phase, rLVR-based prediction and monitoring perform better than PLS.

Fig. 10. Monitoring results of rLVR for IDV(1).

Fig. 11. Monitoring results of PLS for IDV(1).

2) IDV(4) – a Step Change in Reactor Cooling Water Inlet Temperature:In IDV(4), due to the correction of controllers,the quality variables are not affected, and their variations and the predictions of rLVR, PLS and CCA are presented in Figs. 12–14. In terms of prediction performance, rLVR, PLS and CCA achieve comparable results as summarized in Table IV with PLS performing the worst. As validated byvariations in Figs. 15 and 16, the disturbance in IDV(4) is quality-irrelevant. However, PLS raised many false alarms for quality-relevant faults withT2statistic, which reduces the reliability of the fault detection system. The monitoring results for rLVR in Fig. 15 are more credible, which indicate the disturbance affects process variables only.

Fig. 12. Prediction results of rLVR for IDV(4).

Fig. 13. Prediction results of PLS for IDV(4).

Fig. 14. Prediction results of CCA for IDV(4).

TABLE IV MSEs of rLVR, CCA and PLS for IDV(4)

VII. Conclusions

In this paper, a new regularized latent variable regression(rLVR) method is proposed for multivariate modeling and process monitoring. rLVR aims to maximize the projection of quality variables on the latent spaces, which is shown to be equivalent to minimizing the prediction error between process and quality scores. The geometric properties and model relations are derived and summarized for rLVR, and the relation among rLVR, PLS and CCA is analyzed both theoretically and geometrically. The process monitoring framework based on rLVR is developed to detect anomalies in principal component subspace, process principal subspace and process residual subspace. Two case studies, numerical simulations and the Tennessee Eastman process, are employed to demonstrate the effectiveness of rLVR over PLS and CCA in terms of prediction and monitoring.

Fig. 15. Monitoring results of rLVR for IDV(4).

Fig. 16. Monitoring results of PLS for IDV(4).

Appendix A Proof of Lemma 1

With relation in (8), the objective of LVR in (5) can be rearranged as

where θ is the angle betweenuandt, and its range is [ 0,180◦].Additionally, since the direction oftorumakes no difference on the minimum value ofJ, the range of θ can be further restricted to [ 0,90◦]. Therefore, the objective is equivalent to

By substitutingt=Xwandu=Yq, the equivalence between (5) and (13) is proved.

Appendix B Proof of Lemma 2

1) From Algorithm 1 and (16), we have

wher e

Then the first item in Lemma 2 can be proved by

2) Another way to representXiis

where

Additionally, provingis equivalent to show

It is noted that the last two items in Lemma 2,andcan be proved in a similar way to the first two items; thus, their proof is omitted in the paper.

The loading matrixPcan be expressed as

Appendix C Proof of Lemma 3

ThenRT Pis proved to be an identity matrix as follow:

Thus,