Study and Application of Fault Prediction Methods with Improved Reservoir Neural Networks☆

2014-07-17 09:10QunxiongZhuYiwenJiaDiPengYuanXu

Qunxiong Zhu,Yiwen Jia,DiPeng,Yuan Xu

College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China

Study and Application of Fault Prediction Methods with Improved Reservoir Neural Networks☆

Qunxiong Zhu,Yiwen Jia,DiPeng,Yuan Xu*

College of Information Science and Technology,Beijing University of Chemical Technology,Beijing 100029,China

A R T I C L E I N F O

Article history:

Received 15May 2013

Received in revised form 12 October 2013 Accepted 24 December 2013

Available on line 23 June 2014

Fault prediction

Tim e series

Reservoir neural net works

Tennessee Eastman process

Time-series prediction is one of the major methodologies used for fault prediction.The methods based on recurrent neural networks have been widely used in time-series prediction for their remarkable non-liner mapping ability.Asa new recurrent neural network,reservoir neural network can effectively process the time-series prediction.However,the ill-posedness problem of reservoir neural networks has seriously restricted the generalization performance.In this paper,a fault prediction algorithm based on time-series is proposed using improved reservoir neural networks.The basic idea is taking structure risk into consideration,that is,the cost function involves not only the experience risk factor but also the structure risk factor.Thus a regulation coefficient is introduced to calculate the output weight of the reservoir neural net work.Asa result,the amplitude of output weight is effectively controlled and the ill-posedness problem is solved.Because the training speed of ordinary reservoir networks is naturally fast,the improved reservoir networks for time-series prediction are good in speed and generalization ability.Experiments on Mackey-Glass and sunspot time series prediction prove the effectiveness of the algorithm.The proposed algorithm is applied to TE process fault prediction.We first forecast some time series obtained from TE and then predict the fault type adopting the static reservoirs with the predicted data. The final prediction correct rate reaches81%.

©2014 Chemical Industry and Engineering Society of China,and Chemical Industry Press.All rights reserved.

1.Introduction

Fault detection and diagnosis have been studied for almost four decades and become a significant part in control theory.The requirements for system reliability and safety are increasing.It is crucial to know the failure in formation before a fault.As a result,fault prediction has attracted much attention.

The key of fault prediction is to forecast the future state of a system, so fault prediction can be transform ed to time-series prediction.The existing methods of time-series prediction can be classified in to three categories.The first method is based on classical time series analysis, consisting of ARMA model and ARIMA model[1].The second method is based on gray model[2,3]and the last one is based on neural networks[4-7].Among these methods,the neural network method has been studied deeply and applied to time-series extensively for its remarkable non-linear mapping ability.In neural networks and machine learning communities,several types of neural network model are applied to time-series prediction such as the standard multilayer perceptions[8],radial basis function neural networks[9-11],and generalized regression neural networks[12].In addition,recurrent neural networks[13]including nonlinear autoregressive network[14],extreme learning machine networks[15],and recurrent predictor neural networks[16]are also studied for nonlinear time-series prediction.

There are some limitations when applying neural networks to applications.For example,the performance is not good when the forward neural network is applied to time-series prediction.Although recurrent neural networks can solve the problems related to time-series,it has many disadvantages such as large calculation,slow convergence rate and difficulty in determining the number of hidden neuron.Moreover, there are fading memories,which may make error gradient missing or distorted.

To solve these problems,Jaeger and Maass proposed echo state networks(ESNs)[17]and liquid state machine[18],respectively.Although these two methods have different angles,their essence is the improvement of traditional recurrent neural networks.Verstraeten et al.have demonstrated that the two methods are the same in essence and named it reservoir computing[19].Since the report of reservoir computing on Science journal in 2004[20],it has drawn a number of researchers'attention around the world.Beside time-series prediction [21,22],reservoir computing is extended to pattern classification[23], voice recognition[24],image processing[25]and so on.

However,there are some problems in reservoir networks.In many situations,the coefficient matrix for calculating the output weight ismorbidly.To be specific,singular values distribute continuously with no obvious jump.The maximum singular value and the minimum one differ significantly.Consequently,the output weight is extraordinary large especially in the high-dimension reservoir networks.On the other hand, the conventional way to control the output weight is choosing the reservoirs with the dimension as low as possible,but the low-dimension reservoir networks cannot bring good generalization ability.

In this paper,we first study the traditional structure of reservoir networks and analyze its ill-posed ness problem.According to the analysis, the structure risk is taken in to consideration.A formula is obtained to calculate the output weight with minimizing the loss function.This method involves a regulating coefficient that can control the amplitude of the output weight.The ill-posed ness problem is solved in this way. Experiments of two benchmark problems are used to verify the effectiveness of the improved method.A fault prediction algorithm based on the improved reservoir neural networks is proposed and applied to TE process.Six time-series data consisting of 2 variables from 3 faults are predicted.In the classification stage,we take advantage of static reservoir networks to predict the type of the faults.

2.Reservoir Computing

2.1.Structure of reservoir network

The architecture of traditional reservoirs[17]is shown in Fig.1. Some terminologies must be fixed first.We consider discrete-time neural networks with K input units,N internal network units and L output units.Activations of input units at time step n are u(n)=(u1(n),…, uM(n)),those of internal units areχ(n)=(χ1(n),…,χN(n)),and those of output units are y(n)=(y1(n),…,yL(n)).Real-valued connection weights are collected in a N×K weight matrix Win=(for the input weights,in an N×N matrix W=(wij)for the internal connections,in an L×(K+N+L)matrix Wout=()for the connection to the output units,and in a N×L matrix Wback=()for the connections that project back from the output to the internal units.

2.2.Mathematical model

In most cases the output has little effect on internal unit,so we will not study parameter Wback.The equations of the reservoir networks [17]can be written as

Fig.1.The basic structure of reservoir networks.

Considering Eq.(1),we assume that the internal state variables χ have N dimensions,input variables u have M dimensions,and output variables y have L dimensions.To simplify the expressions,we consider bias variables as the connection weight of output fixed value of 1.Thus bχand b can be merged in to matrices Winand Wout.The active function f can be spiking neurons,threshold logic neurons,sigmoid neurons,linear neurons and so on.In this paper,sigmoid function is taken.We first initialize the networks:W and Winare generated randomly and remain unchanged during the calculation,and the original state of internal unit is zero,that is,χ(0)=0.The input and output of training samples are u(k)and y(k),respectively.Thus we can calculate Woutwith Eq.(1).

Some important points on reservoir networks are presented as follows.

First,the dimension of internal state units χ is very high,up to hundred even thousand,while it is relatively lower in the traditional recurrent neural networks.

Second,weight matrices Winand Ware randomly generated and remain unchanged during all training processes.

Last,as one of the measures maintaining the dynamic characteristics of reservoirs,the connection weight matrix of internal state is sparse to the point of2%-5%,different from most traditional recurrent neural networks,which always keep dense connection.

3.Improved Method

In this section,the ill-posedness problems of reservoirs networks are discussed and the improved method solving the ill-posedness problems will be given.

3.1.Training

In order to facilitate the study,we redraw a new picture of the network structure in Fig.2,which is essentially the same as the structure in Fig.1,with the same neuron of reservoir in different moments.Winis the connection weight between the input layer and the reservoir,W is the interval connection weight,and Woutis the connection weight between the output layer and the reservoir.

Training the reservoir networks can be summarized as determining the connection weight matrix Woutbetween the output layer and the dynamic reservoir layer.The following are the detailed steps of building and training a reservoir network.

Step 1 Set the parameters of reservoir networks.Set the number of internal units of reservoirs(N),the sparsity,and the spectral radius of the connection weight matrix of internal state. Initialize the reservoir network.The spectral radius of the connection weight matrix of internal state is always bet ween 0 and 1,but it is not a necessary condition.Sometimes the spectral radius greater than 1 can give better prediction performance.

Step 2 Calculate the interval state.Normalize the input sample and

Fig.2.The simplified model of reservoir.

stimulate the internal state of reservoirs using the normalizedinput sample.The variables of internal state at every moment should be recorded.

Step 3 Calculate the connection weight matrix of output layer(Wout). According to the linear regression relationship between the reservoir state variables and output variables,the connection weight matrix of output layer(Wout)can be obtained.

There are a lot of methods for calculating Wout.Pseudo-inverse method will be taken in this paper.The reservoir state matrix A and the corresponding target vector ydare defined as

where Ω is the length of initial transient process and K is the number of samples.To obtain better accuracy,the initial transient process is always abandoned.Thus A and ydsatisfy the following relation

AWout≈y d.

As only Woutneeds to be ad justed,the target function is

while the conventional training algorithm is given by

where A+is the pseudo-inverse matrix of A,which is also called the generalized inverse matrix.There are orthogonal matrices U∈RK×Kand V∈RN×N.The singular value decomposition of matrix A can be expressed as

where Σ is a K×N diagonal matrix

In this matrix,

where σiis the singular value of A.Using the singular value decomposition the pseudo-inverse of A can be written as

The matrix Σ+can be expressed by all non-zero singular values.

Zero is put on the corresponding place where the singular value is zero.

3.2.The ill-posedness of reservoir networks

Here we discuss the solution of linear system AW=ydwith a disturbance in the system:

Notice that

Calculate||A||and||A−1||using singular value decomposition:

In the same way,we have

Therefore,the condition number is

It is also the ratio of the maximum singular value and the minimum singular value.

The auto-correlation matrix of the state signal can be approximately expressed as

In existing methods of reservoir networks,the eigenvalues of the auto-correlation matrix have large dispersion degree,which may cause a large singular value dispersion degree and large condition number of matrix A,resulting in ill-posedness of reservoir networks.With Eqs.(4)and(8),the method of calculate Woutis given by

Using singular value decomposition,the part with singular value of zero is removed in matrix A.And then the solution of output weight matrix is obtained.However,in applications,it is almost impossible for zero-value singular to appear.It just approaches zero,then serious ill-posedness problem occurs when seeking pseudo-inverse.The solution expressed by Eq.(12)for singular value decomposition can be rewritten as

where uiand viare the left singular vectors and right singular vectors,respectively,and r=rand(A).If the minimum singular value is too small, the output weight Woutwill have extraordinarily large amplitude.

In the reservoir network method,the amplitude distribution of the singular values is continuous in matrix A,and the smallest singular value is close to zero.On the other hand,the pseudo-inverse solution of reservoirs heavily relies on the amplitude of the minimum singular value in Eq.(12).With the minimum eigenvalue calculated,the pseudo-inverse of matrix A is related to the representation accuracy of computer floating-point.Consequently,the solution of reservoirs is influenced significantly by the computer accuracy.

3.3.Calculation

We introduce the structure risk minimize theory to reservoir networks to overcome the extraordinarily large amplitude of Woutmentioned in the last section.According to statistical theory,real risk consists of experience risk and structure risk.A model that balances the two risks well is considered as a good model.Thus we introduce regulating parameter C to ad just experience risk and structure risk.The cost function given by Eq.(3)can be rewritten as

where‖Wout‖2represents the structure risk and‖e‖2stands for experience risk,K is the number of the samples,and ydis the desirable output. This function originates from statistic theory.In order to obtain proper output weight Wout,we should minimize the cost function E(W).

As the form u la is a conditional extreme problem,we transfer it to unconditional extreme problem through the Lagrange equation:

whereλ=[λ1,λ2,…,λK],λj∈Rm(j=1,2,…,N)is the Lagrange multiplier.

Calculate the gradient of the Lagrange equation and make them zero:

With Eq.(16),we can obtainλ=−C(AWout−yd)T,and then

As Eq.(17)only needs some simple linear calculations and a matrix inversion operation,the speed of calculating Woutis very fast.

3.4.Eχperiment

In this section,we give two examples to show the performance of the improved method.The first exam p le is the Mackey-Glass benchmark testing,and the second is the prediction of monthly sunspots.

Table 1Re levant parameters of reservoirs for Mackey-Glass time series prediction

Table 2The performance(NRMSE)of proposed algorithm and other methods

3.4.1.Prediction of Mackey-Glass time series

The Mackey-Glass system is a time-delay differential system with the form of

where χ(t)is the value of time-series at time t.The system is chaotic for δ<16.8.The parameter values are chosen asβ=0.1,α=0.2,andδ= 17.The dataset is constructed using the second-order Runge-Kutta method with a step size of 0.1.We can get the time-series data through this differential system.We use continuous 10 data as the input of the network and the 95th data as the desired output of the network.By this way,a sample is form ed.We can obtain many samples with these time-series data.We choose 1200 samples as the training samples and other 660 samples as the testing samples.The internal state matrix A can be calculated by the training samples and the first formula of Eq.(1).And then according to Eq.(17),Woutis obtained.

In this example,the prediction performance is evaluated by the root mean-squared error on the test sequence pairs normalized by the variance of the original time series(NRMSE).

where yd(i)denotes the target value,y(i)is the corresponding prediction output,n is the number of test examples,andσ2is the variance of original time series.The relevant parameters are listed in Table 1.

Table 2 com pares the improved method and other methods.The normalized mean squared error of the proposed algorithm is smaller than that of other algorithm s,in which CESN and D&SESN are another two improved methods for reservoir neural networks.Fig.3 shows the prediction curve and prediction error,with the absolute error distributing evenly around zero.

3.4.2.Prediction of monthly sunspots

The sunspot numbers used in the paper are 1327 monthly mean sunspot numbers from January 1901 to Ju ly 2011,from the Solar Influences Data Analysis Center(http://sidc.om b.be).A time series is constructed by these numbers.We use continuous 10 data as the input of the network and the next data,the 11th data,as the desired output of the network.By this way,1317 samples are form ed.The first 878 samples are used as the training data while the rest439 samples are used as the testing data.The demission of reservoirs is set as600×600with the sparsity maintaining 2%,while the spectral radius is around 0.91.Table3lists the details.Table 4 presents the normalized mean squared errors of the proposed method and others.The performance of the proposed method is the best.Fig.4 is the prediction performance of sunspot monthly mean numbers with the improved method,in which the absolute errors distribute evenly around zero.

Table 3Relevant parameters of reservoirs for prediction of sunspot time series

Fig.3.The performance of the MC time series prediction.

4.Simulation

4.1.TE process

On the basis of a large number of engineering practices,the U.S. Eastman Chemical Company developed a simulation model of process industry,which is called TE chemical process model.As a real simulation of production plant of Tennessee Eastman Chemical Company,TE process is a typical complex multi-variable nonlinear process[29],with four starting reactants A,C,D and E,products G and H,and by-product F.Four main reactions are as follows.

Table 4The performance(NRMSE)of proposed algorithm and other methods

which are all irreversible exothermic reactions following the Arrhenius equation.Since product G is sensitive to temperature,the reaction temperature of the system must be controlled precisely.Part of liquid products G and H will leave the reactor as vapor while the other part will stay in the reactor as liquid.Thew hole TE process consists of five processing units:reactor,condenser,recycle compressor,gas-liquid separator and stripper.The process flow chart is shown in Fig.5.

4.2.Eχperiment and analysis

Among 20 typical faults in TE process,we will take in to account three of them as listed in Table 5.As the ultimate purpose of the prediction is to predict the specific fault type,the step after time-series prediction is fault prediction.There are 52 variables consisting of 11 control variables and 41measured variables in TE process.These variables can be considered as characteristics describing the faults.

However,different characteristics make different contributions to the prediction of a certain fault.Some characteristics not only fail to provide useful in formation to fault prediction,but also bring some noise to increase the error in fault prediction.Therefore,it is necessary to pick up the characteristics that can provide large a mount of information for prediction of certain faults.

According to literature[30],characteristics 51 and 9 have very high values of mutual information with the output fault type(faults 4,9, 11)and can provide more useful information to fault prediction.In the TE process,characteristics 9 and 51 are the reactor temperature and the reactor cooling water valve opening,respectively.As faults 4 and 11 happen to be the changes in the inlet temperature of reactor cooling water,they are directly related to characteristics 9 and 51.Although fault 9(feeding temperature of material D)does not have obviouslydirect relation with these two characteristics,according to literature [30],they also can distinguish fault 9.

Fig.4.The prediction performance for sunspot monthly mean numbers.

Fig.5.TE process flow chart.

For time-series prediction,the training data and testing data are generated from 3 simulations corresponding to 3 faults listed in Table 5.Each simulation run is set to be 72 h and the sampling interval is 3 m in,so each simulation run contains 1440 samples and each sample has 52 variables.In this paper,only variables 9 and 51 are needed.By taking out variables 9 and 51 of these 3 simulations,6 time-series containing 1440 data are obtained with the first 1002 data being used.For each time-series,we use continuous 10 data as the input of the network and the 13 th data as the desired output of the network.The prediction is 3-step ahead.

In this case,990 samples are form ed.The first 660 samples are used as the training data while the rest 330 samples are used as the testing data.The parameters of the reservoir networks are listed in Table 6.Fig.6 is the prediction performance of 6 time-series.These fitting curves and error curves above indicate that the predictionaccuracy is acceptable.Table 7 shows the prediction mean square error of the six time-series.Then we use these predicted data for fault prediction with the static reservoir,in which the interval connection weight is zero.In this situation,the reservoirs become an extreme learning machine,so it is unnecessary to construct a large-scale reservoir.Experimentally,when the number of hidden layer node equals to 9,the correct rate of prediction canal ways reaches 81%, which is relatively high in TE fault prediction.Theoretically,fault prediction is more difficult than fault diagnosis.In TE fault diagnosis,a method is considered as effective when the rate of correct diagnosis is greater than 80%.The correct rate proves that our method is an advanced one.

Table 5Three certain faults of TE process

5.Conclusions

In this paper,we analyze the ill-posedness problem of the reservoir neural networks and propose a fault prediction method.Two benchmark problems,Mackey-Glass time series prediction and monthlymean sunspot time series prediction give a proof that the proposed algorithm improves the performance and quality of reservoirs especially in generalization ability.In simulation section,we apply the proposed algorithm to predict certain faults of TE process.The correct rate of classification is up to 81.31%in an appropriate static reservoir neural network structure.The fault prediction of TE process is three steps ahead with 3-min interval sampling.That gives enough time to ad just the possible fault.In the future study,we should focus on the optimization of parameters of reservoir neural networks.

Table 6The relevant parameters of reservoirs for the time-series of variables 9 and 51

Fig.6.The performance of the six time-series prediction.

Table 7The prediction mean square error of the six time-series

[1]S.L.Ho,M.Xie,The use of ARIMAmodels for reliability forecasting and analysis, Comput.Ind.Eng.35(1998)213-216.

[2]C.X.Sun,W.M.Bi,Q.Zhou,R.J.Liao,W.G.Chen,New gray prediction parameter model and its application in electrical insulation fault prediction,Control Theory App l.20(5)(2003)797-901.

[3]W.Gao,Y.G.Zheng,The discussion on prediction model of nonlinear time series,J. Tsinghua Univ.(Sci.Technol.)40(s2)(2000)6-10.

[4]B.Liu,D.P.Hu,Studies on applying artificial neural networks to some forecasting problem,J.Syst.Eng.14(4)(1999)338-344.

[5]J.Zhang,A.J.Morris,E.B.Martin,Long-term prediction model based on mixed order locally recurrent neural networks,Com put.Chem.Eng.22(7)(1998)1051-1063.

[6]H.D.Xue,Q.X.Zhu,Tim e series prediction algorithm based on structured analogy, Com put.Eng.36(1)(2010)231-235.

[7]X.P.Lai,H.X.Zhou,C.Q.Yun,Application of hybrid-model neural networks to shortterm electric load forecasting,Control Theory Appl.17(1)(2000)69-72.

[8]A.Lapades,R.Farbar,How neural nets work,Proc.Adv.Neural Inf.Process.Syst. (1987)442-456.

[9]D.Q.Zhang,Y.X.Ning,X.N.Liu,On-line prediction of nonlinear time series using RBF neural networks,Control Theory App l.26(2)(2009)153-157.

[10]S.Haykin,J.Principe,Making sense of a complex world,IEEE Signal Process.Mag.15 (3)(1998)66-68.

[11]M.R.Cow per,B.Mu lgrew,C.P.Unsw orth,Non linear prediction of chaotic signals using a norm alized rad ial basis function network,Signal Process.82(5)(2002) 775-789.

[12]Z.P.Feng,X.G.Song,D.X.Xue,A.P.Zheng,Y.M.Sun,Tim e series prediction based on general regression neural network,J.Vib.Meas.Diagn.23(2)(2003)105-109.

[13]F.Sun,Q.X.Zhu,Study and application on recurrent neural networks controller,J. Beijjing Univ.Chem.Technol.27(3)(2000)88-90.

[14]J.C.Principe,J.M.Kuo,Dynam ic modeling of chaotic time series with neural networks,Advances in Neural In formation Processing System s,7,MIT Press, Cam b ridge,MA,1995,pp.311-318.

[15]J.Zhang,K.S.Tang,K.F.Man,Recurrent NN model for chaotic time series prediction, Proc.23 rd Annu.In t.Con f.Ind.Electron.,Con tro l,Instrum.(IECON),3,1997, pp.9-14.

[16]M.Han,J.H.Xi,S.G.Xu,F.L.Yin,Prediction of chaotic time series based on the recurrent predictor neural network,IEEE Trans.Signal Process.52(2)(2004) 3409-3416.

[17]Herbert Jaeger,The Echo State Approach to Analyzing and Training Recurrent Network,Bremen:GMD Report 148,GMD—Germ an National Research Institute for Computer Science,2001.

[18]T.Wolfgang Maass,M.H.Natsch lager,Real-time computing without stable states:a new framework for neural computation based on perturbations,Neural Com put.14 (1)(2002)2531-2560.

[19]D.Verstraeten,B.Schrauwen,M.D'Haene,D.Strooband t,An experimental unification of reservoir computing methods,Neural Netw.20(3)(2007)391-403.

[20]Herbert Jaeger,Harald Haas,Harnessing nonlinearity:predicting chaotic systems and saving energy in wireless telecommunication,Science 304(5667)(2004) 78-80.

[21]Y.Peng,J.M.Wang,X.Y.Peng,Researches on time series prediction with echo state networks,Acta Electron.Sin.38(2A)(2010)148-154.

[22]Herbert Jaeger,Mantas Lukosevieius,Dan Popovici,Udo Siewert,Optimization and applications of echo state networkswith leaky in tegrator neurons,Neural Netw. 20(3)(2007)335-352.

[23]M.D.Sko wronski,J.G.Harris,Autom atic speech recognition using a predictive echo state network classi fi er,Neural Netw.20(3)(2007)414-423.

[24]D.Verstraeten,B.Sch rauwen,D.Strooband t,J.Van Campen hout,Isolated word recognition with the liquid state machine:a case study,Inf.Process Lett.95(6)(2005) 521-528.

[25]C.D.Pei,Echo state networks and its applications on image edge detection,Com put. Eng.Appl.44(19)(2008)172-174.

[26]Q.S.Song,Z.R.Feng,A new method to construct complex echo state networks,J.Xi' an Jiao Tong Univ.43(4)(2009)1-4.

[27]Georg Holzrmann,Helmut Hauser,Echo state networks with filter neurons and delay&sum readout,Neural Netw.23(2)(2010)244-256.

[28]H.W.Zhang,Sunspot Number Prediction Based on Wavelet Analysis and BP Neural Network,2009.

[29]Luis T.Antelo,Julio R.Banga,Antonio A.Alonso,Hierarchical design of decentralized control structures for the Tennessee Eastman Process,Com put.Chem.Eng.32(2008) 1995-2015.

[30]N.Lv,X.Y.Yu,Fault diagnosis of TE process based on second-order mutual in formation feature selection,J.Chem.Ind.Eng.60(9)(2009)2252-2258.

☆Supported by the National Natural Science Foundation of China(61074153).

*Corresponding author.

E-mailaddress:xuyuan@m ail.buct.edu.cn(Y.Xu).