Local Partial Least Squares Based On line Soft Sensing Method for Multi-output Processes with Adaptive Process States Division☆

2014-07-17 09:10WeimingShaoXueminTianPingWang2CollegeofInformationandControlEngineeringChinaUniversityofPetroleumHuadongQingdao266580China

Weiming Shao,Xuem in Tian,*,Ping Wang,2College of Information and Control Engineering,China University of Petroleum(Huadong),Qingdao 266580,China

2State Key Laboratory of Heavy Oil Processing,China University of Petroleum(Huadong),Qingdao 266580,China

Local Partial Least Squares Based On line Soft Sensing Method for Multi-output Processes with Adaptive Process States Division☆

Weiming Shao1,Xuem in Tian1,*,Ping Wang1,21College of Information and Control Engineering,China University of Petroleum(Huadong),Qingdao 266580,China

2State Key Laboratory of Heavy Oil Processing,China University of Petroleum(Huadong),Qingdao 266580,China

A R T I C L E I N F O

Article history:

Received 25 June 2013

Received in revised form 18 October 2013 Accepted 27 November 2013

Available on line 24 June 2014

Local learning

On line soft sensing

Partial least squares F-test

Multi-output process Process state division

Local learning based soft sensing methods succeed in coping with time-varying characteristics of processes as well as nonlinearities in industrial plants.In this paper,a local partial least squares based soft sensing method for multi-output processes is proposed to accomplish process states division and local model adaptation, which are two key steps in development of local learning based soft sensors.An adaptive way of partitioning process states without redundancy is proposed based on F-test,where unique local time regions are extracted. Subsequently,a novel anti-over-fitting criterion is proposed for on line local model adaptation which simultaneously considers the relationship between process variables and the in formation in labeled and unlabeled samples.Case study is carried out on two chemical processes and simulation results illustrate the superiorities of the proposed method from several aspects.

©2014 Chemical Industry and Engineering Society of China,and Chemical Industry Press.All rights reserved.

1.Introduction

Du ring the past two decades,data-d riven soft sensors have been widely used for estimating hard-to-measure quality variables such as product concentration in chemical reactors due to their attractive properties,for instance,low-cost,easy to implement,and measurement delay-free[1,2].Among a variety of algorithm s,principal component analysis(PCA)[3],partial least squares(PLS)[4],artificial neural networks(ANN)[5]and support vector machines(SVM)[6]are most commonly used ones for soft sensing.Unfortunately,the performance of soft sensors with these methods usually deteriorates because of the time varying nature of process characteristics and changes of external environment[7].For dealing with this issue,recursive versions of the above methods have been developed[8-10].Although these methods can adapt soft sensors to new process states by absorbing newly measured samples,they fail to cope with abrupt changes of process characteristics.Moreover,a single global model could not perform well in wide range when strong process nonlinearities exist.

Alternatively,local learning based soft sensors can simultaneously address the above two issues.In general,there are mainly two steps under local learning framework.The first one is to partition process states into several sub states,upon which local models are built.Clustering based methods,such as k-means[11]and fuzzy c-means[12],or expectation-maximization algorithm based methods[13]are commonly used to get sub datasets.However,it is difficult to determine the number of clusters appropriately and most of these methods are offline,failing to track newly emerged process dynamics.Recently,several approaches were developed to split process states in to local time regions with consecutive samples.Fujiwara et al.proposed to partition the dataset with a fixed moving window[14],with each process state having the same length while yielding too many local models with small window moving width.Ni et al.divided the process states repeatedly[15], where all local regions are still of the same length during the procedure of each partition.Actually,the lengths of local regions should be determined by two aspects,namely the characteristics of process states and the specific modeling function.Therefore,Kadlec and Gabrys have proposed to divide process states considering both the two aspects [16].Nevertheless,their method is limited to single-output processes only,and no new local region is extracted when newly measured samples are available,which encounters difficulty when these samples contain useful process in formation.

The other step is to get the final model output.Multi-model strategy is a well known approach,combining all outputs of local models with different weights[16-18].However,the model adaptation such asensemble methods[16]is difficult and complex.Thus this paper focuses on the other way,where a single model is responsible for estimating target variables,such as just-in-time(JIT)modeling[19-22].In JIT methods,the local model for estimation of target variables is directly built upon historical samples around the query sample,which can cope with the process nonlinearity and abrupt changes to a certain degree.However,high-estimation performance is not always achieved since the correlation between process variables is not considered. Thereby,Fujiwara et al.have proposed to select local models according to a correlation based index constructed by the combination of Q and T2statistics[14].Although the performance is improved compared with the conventional JIT method,the mapping relationship between the target variable and secondary variables is neglected,probably leading to inappropriate model adaptation.Besides,massive memory space is required for reserving the loading matrices of PCA models, which may not be available in some applications[23].Ni et al.[15] have proposed to select the local model that can minimize the prediction error for the newest one sample.However,over-fitting is prone to occur.

In order to address these two issues,i.e.,how to divide process states and how to adapt local models appropriately,this paper develops a local PLS-based soft sensing method for multi-output processes.An adaptive way is first proposed to identify local regions without redundancy by F-test,and in particular,new process states are continued to be extracted.For each local region,PLS is employed to build the model to deal with the co-linearity between process variables.On the on line operation stage,when model adaptation is necessary,the quadratic form of the predicted error vector for the newest sample and the weighted sum of predicted errors for several samples around the query one are combined.The objective of so doing is to provide a robust way for adapting local models,which is expected to enhance the prediction accuracy and greatly reduce the memory cost simultaneously.

The remaining parts of this paper are organized as follow s:in Section 2,an overview of kernel algorithm for PLS is briefly introduced. In Section 3,the proposed method is described in detail,including adaptive process states division,on line model adaptation and the overall procedure for developing the proposed soft sensor.In Section 4,two chemical industrial processes,namely a single-output debutanizer column process and a multi-output sulfur recovery unit,are employed to demonstrate the feasibility and effectiveness of the proposed schemes. Finally,in Section 5,conclusions and future work are put forward.

2.Kernel Algorithm for PLS

In soft sensing field,PLS keeps itself popular because of its distinctive strong points[23]such as dealing with data co-linearity and statistical interpretability.The nonlinear iterative PLS(NIPALS)[24]is a commonly used algorithm to obtain the regression coefficients,but it becomes time-consuming when data matrices contain a massive amount of data,because it needs to deflate both input and output matrices.However,in kernel algorithm for PLS[25,26],the computational load is independent of the size of data matrices,which computes the regression coefficients through another way.

The internal relationship of PLSis expressed as

where X∈RN×mand Y∈RN×prep resent them-dimension input and p-dimension output data matrices respective ly,T=[t1,t2,…,tA] represents the score matrix,P=[p1,p2,…,pA]and Q=[q1,q2,…,qA] rep resent the loading matrices,E and F represent the residual item s of X and Y respectively and A represents the number of latent variables.

The external relationship is

where BPLSis the matrix of regression coefficients that need to be computed.In kernel algorithm for PLS,BPLSis achieved by deflating the variance matrix of X,ΣX,and the covariance matrix of X and Y,which are defined as[27]

where X and Y represent the mean values of input and output variables, respectively,sχiand syjrepresent the variances of the ith input variable and the j th output variable respectively and SX=d iag(sχ1,sχ2,…,sχm), SY=diag(sy1,sy2,…,syp).The detailed description of kernel algorithm for PLS and the corresponding MATLAB code can be found in Dayal and MacGregor's work[26].

3.On line Soft Sensing Method for Muti-Output Process

In this section,the detailed presentation of the on line local PLS-based soft sensing method(OLPLS)will be provided,including the adaptive way for partitioning process states,the on line model adaptation criterion and the procedure of developing OLPLS.

3.1.Adaptive process states division based on F-test

A rational local region should be the period where the modeling function remains constant performance[7].Based on this,the schematic diagram of the proposed F-test based adaptive way of partitioning process states for multi-output processes is shown in Fig.1.Initially,a local model finiis built upon the dataset Zini=[Xini,Yini]within the initial window.Then,the window is shifted one step ahead and thedataset within the shifted window is obtained as Zsft=[Xsft,Ysft].The predicted residuals for Xiniand Xsftby finiare calculated by

Fig.1.Schematic diagram of the adaptive way of dividing process states without redundancy.

When Riniand Rsftare not significantly different,we consider that the performance of finidoesn't deteriorate.Then,it is thought that the samples in Ziniand Zsftcom e from the same process state.Consequently,the window continues to be shifted and new Rsftwill be calculated.Once Rsftdeviates from Rinisignificantly,the window will be stopped and one local process state can be identified with samples from the fir stone of the initial window to the penultimate one of the newest shifted window.Here the way of hypothesis test is employed to examine if Rsftdiffers from Rinievidentially.In the first p lace,we construct statistic F as

where W represents the initial window size;Rinirepresents the mean value of the population where Rinicomes from which is norm ally 0; Rsftand Ssftrepresent the mean value vector and covariance matrix of Rsft.Under the assumption that both Riniand Rsftfollow normal distribution,when the hypothesis Hm∶Rsft=Riniis valid,F follows F distribution with freedom degree of p and W−p[28].That is,F~F (p,W−p).In this paper,the above hypo thesis remains valid as long as Eq.(6)is satisfied and at this time,it's deem ed that Rsftdoes not differ from Rinisignificantly.Thus Zsftand Ziniare though t to belong to the same process state and the window will continue to be shifted.

whereλ1andλ2represent the threshold values corresponding to given significance levelαwith P{λ1<F<λ2}=1−α.In this paper,λ1is set to zero.

As the window moves forward,whenever the hypothesis Hmbecomes invalid,one process state will be defined.However,due to the reappearance of some process states,some newly extracted local models might be superfluous and only one of them needs to be reserved. Accordingly,we also provide an F-test based strategy for distinguishing redundant local regions so as to reduce the online computational load.

Assume a new local region,denoted as Znew=[Xnew,Ynew],is defined,the predicted residual of the lth local model flin the stored local models et is calculated by

Fig.2.Different ways of selecting local models.(a)JIT modeling;(b)CoJIT modeling(c)LARPLS.

If flcan describe Znewwell,the hypothesis Hm,l∶Rl=0 will be valid as analyzed before and the following statistic will follow F distribution, i.e.,

where Wnewand Slrepresent the length of Znewand the covariance matrix of Rl,respectively.If

whereλ′stands for the threshold value corresponding to the given significance level α′with P{F′l<λ′}=1−α′,it is considered that at least one of the stored local models can describe Znewwell and Znewwill be classified as redundant.Consequently,this new local region doesn't need to be stored.

The procedure of adaptively dividing process states without redundancy is summarized as follows.

Step 1 Set initial dataset Zini,build the local model finiby PLS.

Step 2 Shift the window one step ahead,get Zsftand calculate Rsftand Rsftusing Eq.(4).

Step 3 Calculate F statistic by Eq.(5)and deter mine whether the performance of finideteriorates using Eq.(6).IfEq.(6)is valid,return to Step 2;otherwise go to the next step.

Step 4 Define a new local region Znew=[Xnew,Ynew]including samples from the first one of the initial window to the penultimate one of the shifted window.

Step 5 Compute F′lusing Eq.(8)and judge if Znewis redundant or not using Eq.(9).If Eq.(9)is satisfied,set Zini=Zsftand return to Step 1;otherwise go to Step 6.

Step 6 Perform kernel algorithm for PLS on Znew,reserve the regression coefficients,set Zini=Zsftand return to Step 1.

It is worth to point out that new process states are extracted continuously at the on line operation stage,which differs from the work of Kadlec and Gabrys[1 la6]In addition in single-output case, for simplicity F-test can be replaced by t-test which constructs the statistic l-i--c---ity F-test

3.2.Online model adaptation

In present work,one single local model is responsible for estimating the unknown samples.A reasonable way of selecting local models is of great concern.Com pared with distance-based JIT modeling shown as Fig.2(a),correlation-based JIT(CoJIT)[14]selects the local model with the minimal weighted sum of Q and T2statistics for taking in toconsideration the correlation between process variables.Q is the dominant factor but it is not in accordance with the prediction accuracy as shown in Fig.2(b).For current sample(χ0,y0),Q1is less than Q2,so model1 will be selected when adaptation happens.However,e1,the actual predicted error of model1 for(χ0,y0),is much larger than that of model2,e2.Ni et al.proposed a LARPLS method[15],which prefers to the model minimizing the prediction error for the newest labeled sample(χ0,y0).However,over-fitting is a rather worse issue as shown in Fig.2(c),where model2 will be selected for(χ0,y0)after the adaptation.However,the estimation results of model2 for the coming unknown samples,i.e.,(χ+i,?),are very disappointing.Note that CoJIT may suffer from the same problem.

In this paper,we propose a novel robust criterion J*to adapt the local model for multi-output processes,which is shown as

where e0=[e0,1,e0,2,…,e0,p]represents the predicted error vector for the newest labeled sample,ei=[ei,1,ei,2,…,ei,p]is the predicted error vector for the i th nearest sample around the query sample,Θ=d iag {θ1,θ2,…,θp}with all elements positive and real stands for the importance we put on different target variables,0<si<1 represents the weight of the corresponding sample,which can be determined by the similarity between the query sample and its neighbors,and 0<γ<1 is the regularization parameter.

The first item on the right side of Eq.(10)provides the description ability for current operating condition while the second one,the average predicted error for samples in the neighborhood of the query sample,is guarantee to prevent over-fitting happening.The trade-off between the two items is adjusted by.Therefore,J*defined by Eq.(10)is a rational and robust criterion,through which the appropriate local model can be selected for current process dynamics.For example,in the circumstance as shown in Fig.2(c),local model will be adapted to model1 instead of model 2,due to the existence of the second part on the right side of Eq.(10).

Eq.(10)can be expressed as another form,i.e.,

Note that other strategies can be adopted to cut down the on line adaptation frequency[29],but Eq.(12)puts more emphasis to the prediction accuracy than the reduction of model adaptation frequency.

3.3.Procedure of implementing OLPLS

In summary,the procedure of implementing OLPLS consists of two stages:offline stage and on line stage.

Offline stage:

Divide process states without redundancy according to the adaptive way proposed in Section 3.1 and reserve the corresponding local models.In particular,the window will be shifted continuously when the historical dataset is augmented with newly measured samples,such that new process states can be extracted.And this operation can be implemented offline,imposing no computational burden on the on line operation. On line stage:

Step 1 When the estimation for a query sampleχqis necessary, use the newest labeled sample z0=(χ0,y0),the selected K nearest samples aroundχqand the current local model f*to compute Mjby Eq.(11).Several ways can be used to obtain the similarity si.In this paper,a simple but effective approach is employed,i.e.,si=exp(−||χq−χi||2).

Step 2 If Eq.(12)remains invalid,use f*to provide the prediction value forχq,then return to Step 1;otherwise go to next step.

Step 3 By Eq.(10),calculate Jl*with the l th local model for z0and the selected K nearest samples aroundχq,where l=1,2,…,L and L stands for the number of reserved local models at the offline stage.

Step 4 Set f*=fl*,

4.Case Study

In this section,the performance of proposed method(OLPLS)is evaluated through two benchmark datasets from two industrial chemical processes,which is from http://w w w.springer.com/engineering/control/ book/978-1-84628-479-3.A debutanizer column is employed to illustrate the superiority of OLPLS over CoJIT[14].A sulfur recovery unit is utilized to demonstrate the effectiveness of OLPLS for modeling the multi-output process,compared with the distance-based just-in-time PLS(JITPLS)and recursive PLS(RPLS).The estimation accuracy is evaluated by root mean squares error(RMSE),relative RMSE(RE)and maximum absolute error (MAE)defined as

where yi,^yi,N represent the real value,predicted value and the number of test samples,respectively.The average on line consumed CPU time(ten times of simulation)will be em ployed to evaluate the real time performance.The configuration of utilized computer is as follows:OS:Window s XP,RAM:2GB,CPU:Pentium DualE5800(3.2GHz×2),MATLABversion: 7.1.Additionally,we assume each element of one matrix occupies one byte's space and the local model number and occupied memory space are employed to measure the requirement on storage devices.

Fig.3.Block scheme of the debutanizer column.

4.1.Debutanizer distillation column

The effectiveness of the proposed model adaptation criterion and the adaptive way of dividing process states are tested.The debutanizer distillation column is a part of a desulfuring and naphtha splitter plant where propane and butane are removed as overheads from the naphtha steam as shown in Fig.3.One of the main tasks of the debutanizer column is to minimize butane content at the bottom of the column which is norm ally obtained by the gas chromatograph with a large measurement delay.Thus,a soft sensor for on line estimating the concentration of butane is necessary.Several hardware sensors are installed in the debutanizer column for obtaining secondary variables, indicated with gray circles in Fig.3.The detailed description of these input variables is listed in Table 1.

The collected 2394 samples are partitioned in to two parts:the first 1650 samples serve as the historical data and the rest ones are used as test data for evaluating the performance of different soft sensors.The model structure is determined as follows by the analysis of expert know ledge and consideration of process dynamics[1].

For illustrating the effectiveness of each strategy proposed in Section 3, in the first stage,referred to as OLPLS-1,only the model adaptation method proposed in Section 3.2 is employed while the way of partitioning processstates remains the same as that in CoJIT.Subsequently,both the strategies defining local regions and selecting local models discussed in Sections3.1 and 3.2 are em bedded,which is refereed to asOLPLS-2.

Table 1Input variables for soft sensing for the debutanizer column

Evidentially,inappropriate parameters will make the estimation performance rather worse.In this work,aiming at minimizing RMSE, parameters of CoJIT and OLPLS-2 are optimized by the particle swarm optimization(PSO)technique.In CoJIT,window moving width d=1, window size W=215,weightβ=10−6for constructing the correlation index,latent variable numbers for PCA and PLS LVPCA=11 and LVPLS= 10.In OLPLS-1,W and d remain invariant as in CoJIT withγ=0.15 and K=5.In OLPLS-2,W=103,LVPLS=10,significance level α=0.07, γ=0.15 and K=5.The prediction results of the three methods are plotted in Fig.4.Note that for the three soft sensors,model adaptation occurs each time when estimation need of a query sample is necessary.

A scan be seen from Fig.4,although all three soft sensors can mainly track the trend of the target variable,OLPLS-1 and OLPLS-2 perform better than Co JIT does.The same conclusion can be d raw n from the scatter p lot comparison in Fig.5.The scattered points of both OLPLS-1 and OLPLS-2 lean much closer to the b lack diagonal line in the whole operation range of the target variable,indicating the advantages of OLPLS-1 and OLPLS-2 compared with CoJIT.

Furthermore,for deep analysis on the performance of these soft sensors,we list the prediction results from several aspects in Table 2. By comparing the data in first three columns,we can readily conclude that Co JIT performs worst among the three soft sensors and OLPLS-2 further enhances the prediction accuracy on the basis of OLPLS-1.The last three columns show that both OLPLS-1 and OLPLS-2 are superior over CoJIT from the aspects of computational efficiency and memory cost.Actually,in Co JIT,the Q and T2statistics of all local models need to be calculated in each adaptation and the loading a trices and eigenvalues have to be stored.In this single output process,if each element of a matrix occupies one byte's space,a total of L×(m+2)×LVPCAbytes'memory space is indispensable apart from storing parameters of local models,where L means model number and m is the dimensionality of the input vector.In contrast,in OLPLS-1 and OLPLS-2,only L×(m+1) bytes are necessary for storing PLS models'regression coefficients and merely simple vector multiplications are implemented for model adaptation.Especially,the adaptive way of dividing process states can significantly reduce the model number,which is more computationally efficient and memory-saving.

Fig.4.Predicted results of CoJIT(a),OLPLS-1(b)and OLPLS-2(c).

Fig.5.Scatter p lot comparison between CoJIT,OLPLS-1 and OLPLS-2.

Table 2Quantitative performance of the three soft sensors

In OLPLS-2,whenγis set as zero,i.e.,the model adaptation criterion in Ni et al.[15]is employed,RMSE and MAE rise to 0.0118 and 0.0936, respectively.With different values of window size,the prediction performance withγ=0.15 andγ=0 are compared in Fig.6.Evidentially, both RMSE and MAE with γ=0 are quite sensitive to window size.In particular,when window size is set to 105 and 145,the estimation performance is rather worse,clearly indicating that over-fitting occurs.Thus, the second item in the right hand of Eq.(10)is requisite.

Thus far,conclusion can be easily d raw n that OLPLS-2 outperform s OLPLS-1,which outperform s Co JIT.It is noted that both CoJIT and OLPLS-1 have the same parameters and the only difference between them is the criterion for selecting local models.Likewise,OLPLS-2 differs from OLPLS-1 merely in the way of dividing process states.Therefore, these simulation results illustrate the reasonability of the criterion for model selection in Section 3.2 and the effectiveness of the scheme of adaptive process states partition in Section 3.1.The lengths of local time regions are not fixed but adaptively defined.The maximal andminimal lengths of local regions are 203 and 105,respectively.New local regions are continued to be added on the online operation phaseas shown in Fig.7,which is not available in the work of Kadlec et al.[16].

Fig.6.Performance comparison between γ=0.15 andγ=0 under different window sizes.

Fig.7.Length of each local region in OLPLS-2.

4.2.Sulfur Recovery Unit(SRU)

The superiority of two schemes proposed in Sections 3.1 and 3.2 is further illustrated and meanwhile,the strategies of reducing model adaptation frequency by Eq.(12)and eliminating superfluous local regions by Eq.(9)are investigated through the MIMO SRU process. SRU is norm ally utilized to remove environmental pollutants,which are harmful to the atmosphere and human body,from acid gas streams. In this case study,two kinds of acid gas are taken as the input of SRU, namely the MEA gas that is rich in H2Sand the SWSgas that is rich in H2Sand NH3.In SRU,H2Sis transform ed to pure sulfur and SO2is formulated.The tail gas from the SRU contains residual H2Sand SO2,whose concentrations need to be monitored before released to the atmosphere. However,these two kinds of acid gas dam age hardware sensors by corrosion and consequently hardware instruments are frequently removed and maintained.Thus soft sensors are required to estimate the concentrations of H2Sand SO2.A simplified b lock scheme of this SRUprocess is shown in Fig.8,the detailed description of which can be found in[1].

Table 3Description of input and output variables for SRU

Five variables described in Table 3 and the concentrations of H2Sand SO2are considered as the inputs and outputs of the required soft sensors,respectively.

The SRU dataset contains 10,081 samples,among which the first 8000 ones and the rest serve as historical dataset and test dataset, respectively.As analyzed in[1],the model structure is determined as

In the proposed method(OLPLS),two target variables are of the same importance,that is,Θin Eq.(10)is the identity matrix.Meanwhile, δis set to 0 initially.Other parameters are optimized by PSO as follows: W=435,α=0.15,LVPLS=6,K=4,γ=0.137.Parameters in JITPLS and RPLS are also determined by PSO.

The prediction results of the three soft sensors are p lotted in Fig.9. The proposed method outperform s the other two methods,especially in some areas such as the period marked with green rectangle where abrupt changes take p lace,which is enlarged and shown in Fig.10. Also,the quantitative results are shown as Table 4,indicating that the estimation performance of OLPLS for both y1and y2has been significantly improved compared with those of the JITPLS and RPLS,which is mainly as consistent as the conclusion from the previous two figures. These results support the point of view that our proposed method can account for each of the target variables successfully in the con text of multi-output processes.

The above simulation results are accomplished withδ=0 and the model adaptation occurs 2071 times.However,when plant operator is resistant to frequent model adaptation,δcan be set as non-zero values so as to reduce the adaptation frequency as shown by Table5.Apparently, it is a trade-off between prediction accuracy and adaptation frequency. Generally,the higher δ is,the lower the adaptation frequency becomesand the smaller the computational load is,resulting in poorer estimation performance,and vice versa.However,the proposed method can greatly reduce the adaptation times at the cost of sligh t loss of prediction accuracy.For example,withδ=[0.015,0.01],the adaptation times and computational load can be reduced by 44.4%and 37.2%,while the RMSE of y1and y2deteriorates only 3.4%and 2.2%,respectively.It is interesting to notice that whenδis set as[0.005,0.005],the RMSE for y1nearly remains invariant while the RMSE for y2is reduced,because the influence of noise can be restrained to some extent through the sparse on line learning strategy[10].

Fig.8.The block scheme of the SRU process.

Fig.9.Tim e trend comparison for prediction of JITPLS(a),RPLS(b)and OLPLS(c).

Fig.10.Enlargement of the period marked within the green rectangle for y1(a)and y2(b).

In this case study,272 local models are constructed and stored. When there is not enough memory space in some cases[23],the stored local model number can be reduced by the applying Eq.(9)to avoid superfluous local models.The influence ofα′on the estimation accuracyand the scale of local model set are listed in Table 6.It is a trade-off between the estimation performance and stored local model number. However,whenα′is set to 0.2,the local model number can be reduced by 46%while the RMSEof y1and y2raises only 3.7%and 4.1%,respectively, indicating the effectiveness of the proposed scheme for redundant model detection by Eq.(9).Here,α′can be understood as a parameter to provide the threshold value.

Table 4Predicted errors for y1and y2

Table 5Influence ofδon the estimation accuracy,computational load and adaptation frequency

Table 6Performance under different α′for SRU

5.Conclusions

In this paper,we have proposed a novel local PLS-based method for on line soft sensing(referred to as OLPLS)for multi-output processes, with the motivation of addressing two issues in local learning,where adaptive schemes for process states division and anti-over-fitting on line model adaptation are provided.The application results to two chemical processes indicate that the proposed method outperform s CoJIT, distance-based JITand RPLS from the perspectives of prediction accuracy, memory cost and computational load.However,the harmfulness of outliers is still considerable.Even though numerous ways for detecting outliers have been reported,how to distinguish samples of normal but new process states from the real outliers with a high accuracy is still a challenging issue,which will be our future work.

Nomenclature

BPLSregression coefficients of PLS model

E,Y residual item of X and Y

e predicted error vector

F F statistic

finilocal model constructed by Zini

J*criterion for model adaptation

m,p dimensionality of input and output vector

P,Q loading matrix of X and Y

Rinipredicted residual of finifor Rini

Rini,Rsftmean value vector of Riniand Rsft

Rsftpredicted residual of finifor Zsft

Ssftcovariance matrix of Rsft

s sample similarity

sχi,syjvariance of the i th input variable and j th output variable

T score matrix

T T statistic

W initial window size

X input matrix

X,Y mean value vector of X and Y

Y output matrix

Ziniinput and output pairs in the initial window

Zsftinput and output pairs in the shifted window

α,α′significance levels

γ regularization parameter

δ pre-defined threshold value

Θ importance weight

λ,λ′threshold values corresponding toα,α′

ΣXvariance matrix of X

ΣXYcovariance matrix of X and Y

[1]L.Fortuna,S.Graziani,A.Rizzo,G.M.Xibilia,Soft sensors for monitoring and control of industrial processes,Sp ringer-Verlag,London,2007.

[2]P.Kadlec,B.Gabrys,S.Strand t,Data-d riven soft sensors in the process industry, Comput.Chem.Eng.33(4)(2009)795-814.

[3]Z.Q.Ge,Z.H.Song,Semi supervised Bayesian method for soft sensor modeling with unlabeled data samples,AIChE J.57(8)(2011)2109-2118.

[4]H.J.Galicia,Q.P.He,W.Jin,A reduced order soft sensor approach and its application to continuous digester,J.Process Control21(4)(2011)489-500.

[5]J.C.B.Gonzaga,L.A.C.Meleiro,C.Kiang,R.Maciel Filho,ANN-based soft-sensor for real-time process motoring and control of an industrial polymerization process, Com put.Chem.Eng.33(1)(2009)43-49.

[6]J.Yu,ABayesian inference based two-stage support vector regression framework for soft sensor development in batch bioprocesses,Com put.Chem.Eng.41(1)(2012) 134-144.

[7]P.Kad lec,R.Grbić,Review of adaptation mechanism s for data-d riven soft sensors, Com put.Chem.Eng.35(1)(2011)1-24.

[8]J.Tang,W.Yu,T.Y.Chai,L.J.Zhao,On-line principal component analysis with application to process modeling,Neuro computing 87(4)(2012)167-178.

[9]S.J.Qin,Recursive PLS algorithm s for adaptive data modeling,Com put.Chem.Eng.22 (4-5)(1998)503-514.

[10]P.Wang,H.G.Tian,X.M.Tian,D.X.Huang,A new approach for online adaptive modeling using incremental support vector regression,CISECJ.61(8)(2010)2040-2045.

[11]O.Carlos,O.Edward,Efficient disk-based k-means clustering for relational databases,IEEETrans.Know l.Data Eng.16(8)(2004)909-921.

[12]Y.F.Fu,H.Y.Su,Y.Zhang,J.Chu,Adaptive soft-sensor modeling algorithm based on FCMISVM and its application in PX adsorption separation process,Chin.J.Chem.Eng. 16(5)(2008)746-751.

[13]J.Yu,On line quality prediction of nonlinear and non-Gaussian chemical processes with shifting dynamic susing finite mixture model based Gaussian process regression approach,Chem.Eng.Sci.82(2012)22-30.

[14]K.Fu jiw ara,M.Kano,S.Hasebe,A.Takinam i,Soft-sensor development using correlation-based just-in-timemodeling,AIChEJ.55(7)(2009)1754-1764.

[15]W.D.Ni,S.K.Tan,W.J.Ng,S.D.Brow n,Localized,adaptive recursive partial leasts quares regression for dynamic system modeling,Ind.Eng.Chem.Res.51(8)(2012)8025-8039.

[16]P.Kadlec,B.Gabrys,Local learning-based adaptive soft sensor for catalyst activation prediction,AIChE J.57(5)(2009)1288-1301.

[17]S.Khatibisepehr,B.Huang,F.W.Xu,A.Espejo,A Bayesian approach to design of adaptive multi-model inferential soft sensors with application in oil sand industry, J.Process Control22(10)(2012)1913-1929.

[18]S.N.Zhang,F.L.Wang,D.K.He,R.D.Jia,Real-time product quality control for batch processesbased on stacked least-squares support vector regression models,Com put. Chem.Eng.36(10)(2012)217-226.

[19]C.Cheng,M.S.Chiu,A new data-based methodology for nonlinear process modeling, Chem.Eng.Sci.59(13)(2004)2801-2810.

[20]K.Chen,J.Ji,H.Q.Wang,Z.H.Song,Adaptive local kernel-based learning for soft sensor modeling of nonlinear processes,Chem.Eng.Res.Des.89(10)(2011)2117-2124.

[21]Y.Liu,Z.L.Gao,P.Li,H.Q.Wang,Just-in-time kernel learning with adaptive parameter selection for soft sensor modeling of batch processes,Ind.Eng.Chem.Res.51(11) (2012)4313-4327.

[22]Y.Q.Liu,D.P.Huang,Y.Li,Development of interval soft sensors using enhanced just-in-time learning and inductive confidence predictor,Ind.Eng.Chem.Res.51 (8)(2012)3356-3367.

[23]P.Kadlec,B.Gabrys,Adaptive on-line prediction soft sensing without historical data,Proceedings of the2010 International Joint Conference on Neural Networks,Barcelona,2010.

[24]P.Geladi,R.K.Bruce,Partial least-squares regression:a tutorial,Anal.Chim.Acta.185 (1986)1-17.

[25]F.Lindgren,P.Geladi,S.Wold,The kernel algorithm for PLS,J.Che mom.7(1)(1993) 45-59.

[26]B.S.Dayal,J.F.MacGregor,Improved PLS algorithm s,J.Che mom.11(1)(1997)73-85.

[27]J.L.Liu,Development of self-validating soft sensors using fast moving window partial least squares,Ind.Eng.Chem.Res.49(22)(2010)11530-11546.

[28]X.Q.He,multivariate statistical analysis,China Renmin University Press,Beijing,2004.

[29]Y.Liu,H.Q.Wang,J.Yu,P.Li,Selective recursive kernel learning for on line identification of nonlinear systems with NARX form,J.Process Control 20(2)(2010) 181-194.

☆Supported by the National Natural Science Foundation of China(61273160)and the Fundamental Research Funds for the Central Universities(14CX06067A,13CX05021A).

*Corresponding author.

E-mailaddress:tianxm@upc.edu.cn(X.Tian).