Skew t Distribution-Based Nonlinear Filter with Asymmetric Measurement Noise Using Variational Bayesian Inference

2022-04-12 08:27ChenXuYawenMaoHongtianChenHongfengTaoandFeiLiu

Chen Xu,Yawen Mao,Hongtian Chen,Hongfeng Tao and Fei Liu

1Key Laboratory of Advanced Process Control for Light Industry(Ministry of Education),Jiangnan University,Wuxi,214122,China

2School of Science,Jiangnan University,Wuxi,214122,China

3Department of Chemical and Materials Engineering,University of Alberta,Edmonton,AB T6G 2G6,Canada

ABSTRACT This paper is focused on the state estimation problem for nonlinear systems with unknown statistics of measurement noise.Based on the cubature Kalman filter,we propose a new nonlinear filtering algorithm that employs a skew t distribution to characterize the asymmetry of the measurement noise.The system states and the statistics of skew t noise distribution,including the shape matrix,the scale matrix,and the degree of freedom(DOF)are estimated jointly by employing variational Bayesian(VB)inference.The proposed method is validated in a target tracking example.Results of the simulation indicate that the proposed nonlinear filter can perform satisfactorily in the presence of unknown statistics of measurement noise and outperform than the existing state-of-the-art nonlinear filters.

KEYWORDS Nonlinear filter; asymmetric measurement noise; skew t distribution; unknown noise statistics; variational Bayesian inference

1 Introduction

State estimation serves as an important role in various fields, such as control, signal processing, fault detection and diagnosis, and many more [1-7].Due to its effectiveness and optimality,the Kalman filter (KF)is the state estimation method of the most widespread used for linear systems with Gaussian noise distribution [8-10].Limited by the assumption of linear, many nonlinear filters have been presented [11-13], the most famous of which is the extended Kalman filter(EKF)[14].To solve the error caused by linearization in EKF, the cubature Kalman filter (CKF)and unscented Kalman filter (UKF)were developed by using sigma points to approximate the posterior distribution [15,16].Among them, CKF is proven to have better estimation performance in high-dimensional nonlinear estimation.However, the above state estimation methods assume Gaussian noise distribution and their noise statistics are completely known, which is not available in practice.

To deal with the unknown noise statistics, various adaptive and robust filters were designed for joint state estimation [17-19].For example, a recursive state estimation method was presented with unknown Gaussian noise covariance for linear systems [20].Further, an adaptive variational Bayesian (VB)-based filter was designed for estimating the covariance of process noise and measurement noise by selecting inverse Wishart priors [21].Combining with maximum correntropy criterion, an adaptive and robust filter was developed by estimating the Gaussian measurement noise covariance [22].However, the above Gaussian-based estimation methods are unsuitable for the heavy-tailed noise which caused by outliers or impulse interferences.In the case where both the measurement noise and the process noise are Student’s t distributed noise, the Student’s t filter was first proposed in [23].By minimizing the Kullback-Leibler divergence, an adaptive t-filter was developed to estimate the scale matrix of Student’s t distribution [24].For nonlinear system, a recursive outlier-robust nonlinear filter was proposed for Student’s t distributed noise in [25] and a robust Gaussian approximate filter was presented with unknown statistics of Student’s t noise distribution in [26].

Due to the complex environment, not only the noise distribution with heavy-tailed characteristics but also the asymmetry of noise distribution should be considered.As shown in Fig.1,skew t distribution obtains better fitting performance than the Gaussian distribution and Student’s t distribution, which are symmetric distributions.Thus, several estimation methods were presented for the skew t distribution, which has both skewness and heavy-tails [27-29].For example, a skew t variational Beyasian filter was designed for measurement noise with heavy-tails and skewness in [30] and the estimation accuracy was improved by covariance matrix approximation in [31].In [32], a robust filter was developed to estimate the skew t distribution, consisting of the scale matrix and degree of freedom (DOF).Moreover, some other filtering algorithms that can describe asymmetric noise distribution also have been proposed in [33,34].Unfortunately, the above skew t distribution-based methods are all in linear systems and cannot be applied to nonlinear systems.

Figure 1:Skew t distribution has better fitting than symmetric distributions

In this paper, a new skew t cubature Kalman filter (STCKF)is proposed for nonlinear system with heavy-tailed and skewed measurement noise.The skew t distribution is adopted to describe the measurement noise and the prior distributions of the shape matrix, scale matrix and DOF are chosen as Gaussian, inverse Wishart and Gamma distributions, respectively.The unknown statistics including shape matrix, scale matrix and DOF are inferred with the VB approach and the posterior of states is also simultaneously obtained.The results of simulation demonstrate that the proposed STCKF has better estimation accuracy as compared with the CKF and Student’s t distribution-based CKF.

The paper is structured as follows:Section 2 describes the problem studied in this paper.Section 3 proposes a skew t cubature Kalman filter based on VB inference.In Section 4, an example of target tracking is presented to verify the estimation performance of the proposed STCKF.The conclusions of this paper are given in Section 5.

Notations:RKdenotes theK-dimensional Euclidean space, E[·] and tr(·)represent the expectation operator and the trace operator, I is the identity matrix with appropriate dimension,ATis the transpose of matrixA, diag(·)is the diagonal matrix, N(x,P)is a Gaussian distribution with mean vectorxand covariance matrixP, N+(μ,Σ)represents the truncated Gaussian distribution in positive orthant, and its location is μ and scale matrix is Σ.

2 Problem Formulation

Consider the nonlinear state-space model

wherenis the discrete time index,f(·)andh(·)are the nonlinear state function and measurement function,zn∈Rdis the measurement vector,xn∈Rmis the state vector.The process noisewnis Gaussian white noise andwn~N(0,Qn).The measurement noisevnis the heavy-tailed and asymmetric noise.The initial state vectorx0is assumed to have a Gaussian distribution,

The skew t distribution is used to describe the heavy-tailed and asymmetric of noise, therefore,the measurement noisevn:

where ST(vn;0,Δn,Rn,νn)denotes the skew t distribution with location 0, shape matrix Δn, scale matrixRnand DOF νn.Specifically, the definition of skew t distribution can be seen in [30,32].

Fig.2 shows the different Δ of distributions ST(x;0,1,Δ,3).As shown, with the decreasing of Δ, the skew t distribution will deteriorate to the Student’s t distribution.Therefore, the skew t distribution has both heavy-tails and asymmetric properties with suitable values.

Figure 2:Different Δ for ST(x;0,1,Δ,3)

In the following, based on the CKF, we will design a nonlinear filter under nonlinear model(1)-(2)with the measurement noise followed by skew t distribution.Specifically, the statistics of skew t distribution including the shape matrix, the scale matrix and the DOF are unknown and need to be estimated together with the system states by using VB inference.

3 Proposed Skew t Cubature Kalman Filter Using VB Inference

3.1 Prior Distributions Update

Similar to CKF, the predicted distribution of system statexnis

wherexn|n−1andPn|n−1can be approximated by the CKF here.More specifically, the cubature points are obtained by

wherei=1,···,2m,and [1]idenotes thei-th element of the following set:

The predicted cubature points are

Hence, the predicted state and the corresponding error covariance are given by

In Bayesian theory, the conjugate prior needs ensure posterior distribution have the same functional form with prior distribution.Therefore, to infer shape matrix Δn, scale matrixRn, and DOF νn, the conjugate priors of Δn,Rn, and νnneed to be selected.The prior distribution of Δnis selected as Gaussian distribution [33]:

where μn|n−1,jis the predicted mean ofj-th dimension and σn|n−1,jis the corresponding variance,respectively.The prior distribution ofRnis selected as inverse Wishart distribution [21]:

whereDn|n−1andcn|n−1are the inverse scale matrix and DOF, respectively.The prior distribution of νnis selected as Gamma distribution [26]:

wherean|n−1andbn|n−1are the shape parameter and the rate parameter, respectively.

To obtain (10)-(12), the dynamic modelp(Δn|Δn−1),p(Rn|Rn−1)andp(νn|νn−1)need to be specified.In practice, the variation of the measurement noise parameters is slow, and we can use a forgetting factor ρ ∈(0 1] to describe the predicted distribution in this paper [21]:

Because of the skewed t distribution does not have a strictly closed form, the state posterior distribution will be difficult to obtain.With the introduction of two hidden variablesunand Λn,the likelihoodp(zn|xn)can be rewritten by the following hierarchical Gaussian model [30]:

3.2 Posterior Estimation

In order to estimatexnfrom (5), (10)-(12), and (16)-(18), the joint posteriorp(xn,Δn,un,Λn,Rn,νn|z1:n)needs to be computed.According to Bayes’ theorem,

where H={xn,Δn,un,Λn,Rn,νn}.Due to inter-coupled parameters, it is infeasible to infer the posteriorp(H|z1:n)analytically.From (19), the logarithmic marginal likelihood logp(zn|z1:n−1)can be derived as [35]

where KLD(·)and L are the Kullback-Liebler divergence and the lower bound of logp(zn|z1:n−1),respectively.Due to the non-negativity of the KLD, we can obtain the true posterior by minimizing the KLD betweenq(H)andp(H|z1:n)[35,36].Thus, the fix-point iteration of VB inference is utilized to approximatep(xn,Δn,un,Λn,Rn,νn|z1:n)by means of the product of some individual distributions [21,37], i.e.,

whereq(·)represents the approximate posterior ofp(·).An analytical solution ofq(xn),q(Δn),q(un),q(Λn),q(Rn), andq(νn)can be obtained by [21]

where φ is an element of H, H(−φ)is all elements in H except for φ, andcφis the constant on the variable φ.

Based on Bayesian theory, we can obtain the joint posterior distribution as follows:

Substituting (5), (10)-(12), and (16)-(18)into (23)results in

When φ=xn, the posterior distributionq(xn)is calculated as:

wherexn|nandPn|nare the estimate of state and the corresponding covariance respectively, and can be obtained by

When φ=Δn, the posterior distributionq(Δn)is calculated as

where the mean μn|n,jand variance σn|n,jare given by

where εn=E[zn−h(xn)].The derivation of (33)-(36)can be seen in Appendix B.

When φ=un, the posterior distributionq(un)is calculated as

where the locationun|nand covarianceUn|nare obtained by

The derivation of (36)-(40)can be seen in Appendix B.

When φ=Λn, the posterior distributionq(Λn)is calculated as

where the shape parameter αnand the rate parameter βnare given by

where ψnis an auxiliary parameter and the derivation of (41)-(43)can be seen in Appendix C.

When φ=Rn, the posterior distributionq(Rn)is calculated as

where the DOFcn|nand inverse scale matrixDn|nare obtained by

where Υnis an auxiliary parameter and the derivation of (44)-(46)can be seen in Appendix C.

When φ=νn, the posterior distributionq(νn)is calculated as

where the parametersanandbnare given by

The derivation of (47)-(49)can be seen in Appendix D.

Using (25), (33), (36), (41), (44), and (47), the following expectations are required:

where ψ(·)denotes the digamma function.The computations of E[un] and E[unuTn] can be found in [38].

After taking N fix-point iteration steps, the approximations of posterior distributions are updated as

Combining prediction steps (5)and (10)-(12)with measurement updates (25), (33), (36), (41),(44)and (47), the proposed STCKF can be realized recursively.To implement the proposed filter,the initial shape matrix Δ0, the initial scale matrixR0, the initial DOF ν0, and the forgetting factor ρ need to be determined.Generally, Δ0,R0, and ν0can be approximately achieved from prior knowledge.The forgetting factor ρ can determine how much information from the previous estimation.That is, choosing a small ρ means forgetting more information and vice versa.In the proposed STCKF, the selection of N is a trade-off.Increasing N will obtain better estimation accuracy but more time-consuming.

4 Target Tracking Simulation

To validate the estimation performance of the proposed STCKF, a target tracking simulation is introduced to perform an evaluation of the results obtained.The STCKF is compared with the CKF [16], the Student’s t distribution based-CKF (T-CKF)[25], and the robust Student’s t distribution-based CKF (RT-CKF)[26].

In this paper, a typical air traffic control scenario is considered, in which the aircraft performs maneuvering turns on the horizontal plane at a constant but unknown turning rate Ω.

The kinematics of rotational motion can be described by the following nonlinear state-space model [16]:

where the statethe position and velocity of target in x and y directions are(ξ,η) andTis the time interval and Ω is the tuning rate. The process noise covariance isQn=diag (q1M,q1M,q2T), where

The associated parameters are set as:q1=0.1 m2s−3,q2=1.75×10−4s−3, Ω=−3°s−1,T=1 s.The true initial state and the corresponding covariance arex0=(1000,300,1000,0,−3)TandP0=diag (100, 10, 100, 10, 100), respectively.

In this simulation, we consider three cases for measurement noise:

Case 1:Gaussian distribution, that is,vn~N(0,Rn)with noise covarianceRn=diag(100,10),

Case 2:Contaminated Gaussian distribution (mixture Gaussian distribution).The mixture Gaussian distributed noise is generated according to [33]

whereRn=diag(100,10).Eq.(63)means thatvnis drawn from N(0,Rn)with probability 1 −pcand N(0,100Rn)with probabilitypc.In this paper,pc=0.1.

Case 3:Contaminated skew t distribution (mixture skew t distribution).

According to [32], the measurement noisevnis generated by

whereRn=diag(100,10), Δn=diag(5,5), and νn=4.

In this paper, the root mean square error (RMSE)is adopted to test the filtering performance,and its formula is

whereandxjnare the estimated and true values, respectively, at thej-th Monte-Carlo run.The simulated inputs area0=4,b0=1,c0=3,D0=diag(1,1), μ0=diag(1,1), σ0=diag(1,1),ρ=1 −exp(−5)and N=5.

Figs.3-5 show that the RMSEs of position, velocity, and tuning rate based on 20 Monte-Carlo runs for three cases.From Fig.3, the CKF, T-CKF, RT-CKF and the proposed STCKF almost have the same estimation accuracy under Gaussian distribution noise.However, the methods based on the non-Gaussian distribution outperform the methods based on the Gaussian distribution when the measurement noise no longer satisfies the Gaussian distribution.As shown in Fig.4, the non-Gaussian filters (T-CKF, RT-CKF and STCKF)perform better than the CKF.From Fig.5, the proposed STCKF obtains the best accuracy in the case of asymmetric noise distribution and unknown noise statistics.As also observed in Table 1, the filter based on the skewed t distribution and the filters based on the Student’s t distribution perform better than the filter based on the Gaussian distribution in Cases 2 and 3, and the proposed STCKF is significantly better than other filters for the asymmetric noise distribution.

Table 1:Average RMSEs by different filters

Figure 3:RMSEs of the position, velocity, and tuning rate by different filters under Case 1

Figure 4:RMSEs of the position, velocity, and tuning rate by different filters under Case 2

Figure 5:RMSEs of the position, velocity, and tuning rate by different filters under Case 3

5 Conclusion

In this work, we consider the joint estimation problem of system states and unknown noise statistics for nonlinear discrete-time systems.Combining with the properties of skew t distribution, a hierarchical nonlinear Gaussian model is developed.Based on this model, a skew t cubature Kalman filter is proposed, in which the states, shape matrix, scale matrix and DOF are simultaneously estimated by using VB approach.The results of simulation show that the proposed filter in this paper has better estimation accuracy than the conventional CKF and the Student’s t distribution-based CKF under heavy-tailed and skewed measurement noise.It should be noted that the proposed method in this paper can only realize state estimation of asymmetric measurement noise.How to extend the proposed method to handle asymmetric process and measurement noise is still an open problem.

Funding Statement:This work was supported in part by National Natural Science Foundation of China under Grants 62103167 and 61833007, and in part by the Natural Science Foundation of Jiangsu Province under Grant BK20210451.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Appendix A

Substituting φ=xninto (22)yields

Defining the modified likelihood distributionp(zn|xn)as

and using (5)and (67)in (66), we have

According to (5)and (66)-(68), (25)-(32)can be obtained.

Appendix B

Substituting φ=Δninto (22)yields

where the auxiliary parameter εnis given by

Substituting φ=uninto (22), we have

Similar to (66)-(68), (33)-(40)can be obtained.

Appendix C

Substituting φ=Λninto (22), we have

where

Substituting φ=Rninto (22)yields

where

According to (72)and (74), (41)-(46)can be obtained.

Appendix D

Substituting φ=νninto (22)yields

Using Stirling’s approximation:[26], logq(νn)can be rewritten as

According to (77), (41)-(43)can be obtained.