A manifold extended t-process regression

2021-12-04 09:00GuoShiweiWangZhanfengWuYaohua
中国科学技术大学学报 2021年5期

Guo Shiwei, Wang Zhanfeng, Wu Yaohua

Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Abstract: A manifold extended t-process regression (meTPR) model is developed to fit functional data with a complicated input space. A manifold method is used to transform covariate data from input space into a feature space, and then an extended t-process regression is used to map feature from feature space into observation space. An estimation procedure is constructed to estimate parameters in the model. Numerical studies are investigated with both synthetic data and real data, and results show that the proposed meTPR model performs well.

Keywords: Gaussian process regression; extended t-process regression; manifold; robustness

1 Introduction

A nonparametric regression method, Gaussian process regression (GPR), proposed by Williams and Rasmussen[1]in 1996, is widely used to fit functional data. Rasmussen[2]discussed the detailed algorithm of using the Gaussian process (GP) in the supervised learning of regression and classification, where various covariance functions were proposed and their characteristics were discussed. Shi and Choi[3]introduced methods using the Gaussian process in functional data space. Sun et al.[4]used GPR to predict short-term wind speed, and Liu et al.[5]applied GPR on the prediction of short-term deformation of the tunnel surrounding rock. Many researchers have expanded and improved the Gaussian process from different perspectives. With regard to computational space complexity, Smola and Bartlett[6]used sparse greedy technique to approximate the maximum posterior estimation of the Gaussian process, which performs well when the dataset is large. Seiferth et al.[7]proposed Meta-GP algorithm applying GPR on non-Gaussian likelihood data, which is suitable for data stream processing with low computational complexity. Banerjee et al.[8]proposed a method to solve data storage and processing issues on large dataset by substituting the dataset with a random projection on low dimension subspace. Since GP is susceptible to outliers in data, there are many robust processes proposed to fit functional data. For example, Wauthier and Jordan[9]proposed that GPR tends to overfit in sparse areas of data. They used heavy-tailed stochastic processes to improve the robustness of the estimation. Yu et al.[10]showed that t-process can improve robustness of model. Shah et al.[11]showed that t-process can reduce the overfitting problem while maintaining the excellent properties of GP.Jylänki et al.[12]utilized a t-observation model (Student-t observation model) in GPR and did estimations with expectation propagation to improve robustness of prediction and overcome problems of t-process model. Wang et al.[13]proposed a nonparametric regression method with more robustness than GPR by combining Gaussian process and inverse gamma distribution, which is call extended t-process regression (eTPR).

GPRs and other robust process models are powerful nonparametric regression methods. However, the traditional GPR does not perform well when the dataset is not on vector-space, such as manifold data. For non-smooth data, such as step function, numerical studies show that both GPR and eTPR perform ill. This paper introduces manifold models to devise flexible covariance functions which improved the performance of prediction. Manifolds are now widely used in data processing to change the dimension of the data. When the dimension of the data is large, manifolds can map data to a low-dimensional space to reduce the computational complexity and increase the speed of calculation. Lin and Yao[14]proposed a functional regression method on the manifolds. By means of functional local linear manifold smoothing, the convergence of estimation can reach polynomial speed, and the estimation also performs well on noisy data. Sober et al.[15]used moving least squares to estimate the function on manifolds with linear time complexity, which avoided the non-linear dimensionality reduction process and the loss of information. Zhan and Zhou[16]proposed ManiMIL(manifold based multi-instance learning) and used collapse phenomenon originated from the MIL(multi-instance learning) algorithm to do the prediction, which reduced the calculation time and addressed the collapse issue of LLE(locally linear embedding). In order to enhance the data diversity when reducing data dimension, Gao and Liu[17]reported a method to reconstruct the data with a new defined manifold distance, which improved the recognition rate significantly. Fan and Chen[18]proposed ManiNLR, by combining manifold model with nonlinear regression. They used the manifold model to map high-dimensional space to low-dimensional space, which improved the classification speed. Recently, Calandra et al.[19]combined the manifold method with GPR and created manifold Gaussian process regression (mGPR) by mapping input data to feature space, which improved the accuracy of the prediction, especially at the discontinuous points. Mallasto and Feragen[20]extended GPR to non-vector space by defining wrapped Gaussian processes (WGPs) on Riemannian manifolds.

GPR methods with manifolds, however, are not robust to handle outliers in data. To the best knowledge of authors, there is not a robust process manifold regression model reported in literature. In this paper, we combine t-process and manifold methods to create a robust manifold regression model to fit functional data, which is called the manifold extended t-process regression model (meTPR). We used manifold model to map input space into a feature space. Then the eTPR method is applied to the data in the feature space to capture the nonlinear structures of data.Compared to GPR and eTPR models, the proposed method can fit data from complicated input space, such as non-smooth data. Manifold model significantly improves the accuracy of prediction. In addition, meTPR is more robust than GPR-based manifold methods.

The remainder of the paper is organized as follows. In Section 2, we present the manifold extended t-process regression, and the estimation procedure. Numerical studies and real examples are given in Section 3. Robustness and information consistency properties are showed in Section 4. We conclude in Section 5. Additional technical details and all the proofs are presented in the Appendix.

2 Manifold extended t-process regression

Consider a functional regression model

y=F(x)+

(1)

wherexis the covariate from input space X⊆D, andy∈Y⊆is the observation. We focus on the task of learning a regression functionF: X→Y. To simplify the input space which is usually complicated and improve the accuracy of prediction of non-smooth data, we introduce the manifold model, mapping data space X to feature space H. Then, we use an eTPR model to depict the relationship between the feature space H and the output space Y.

The used manifold transformation is a nested mapping as follows,

F=f∘M

(2)

whereM: X→H is the manifold transformation from the input space X to the feature space H⊆Q, andf: H→Y is a function from the feature space H to the output space Y⊆. Letz=M(x)∈H be the features. Then we havef(z)∈Y.

2.1 Manifold transformation

A continuous transformationM(x)=(Tl∘…∘T1)(x) has been used by Calandra et al.[19], wherelis the number of layers,xis the input data. Inspired by Calandra et al.[19], we use that transformation in this article. EachTcan be written as the following transformation,

Ti(xi)=t(Wixi+Bi)

(3)

wherexiis the input of each layer,x1=x,tis a transformation function, such ast(x)=1/(1+e-x), andWiandBiare the weights and bias of each transformation respectively. For the manifold transformationM, vectorθMcomprises weight parameters and bias parameters of the transformation for each layer, i.e.θM=[W1,B1, …,Wl,Bl]T. This transformation can be regarded as one or more widely used coordinate transformations and sigmoid transformations, where the sigmoid transformation is symmetry and has robustness against outliers.

2.2 t-process regression

We now briefly introduce an extended t-process (ETP) and an extended multivariate t-distribution (EMTD). Wang et al.[13]extend a Gaussian process to a t-process using the idea in Reference [21]:

f|r~GP(h,rk),r~IG(v,ω)

(4)

where GP(h,rk) stands for a GP with a mean functionhand a covariance functionrk, and IG(v,ω) stands for an inverse gamma distribution. Then,ffollows anf~ETP(v,ω,h,k), implying that for any collection of pointsx=(x1,…,xn)T, we have

fn=f(x)=

(f(x1), …,f(xn))T~EMTD(v,ω,hn,Kn)

(5)

meaning thatfnhas an extended multivariate t-distribution (EMTD) with the density function,

(6)

hn=(h(x1),…,h(xn))T,Kn=(kij)n×nandkij=k(xi,xj) for some mean functionh(·): X→and covariance kernelk(·, ·): X×X→.

After mapping the input space to the feature space, we let the eTPR model

y(z)=f(z)+(z)

(7)

wherezis the feature in feature space H.

We assumefandare a joint extended t-process (ETP),

(8)

wherehandkare the mean function and kernel function, respectively. The covariance function ofisk(u,v)=φI(u=v), whereI(·) is an indicative function. We can express the ETP hierarchically as

(9)

andr~IG(v,ω)

(10)

where IG(v,ω) is inverse gamma distribution with parametersvandω. It shows thaty~ETP(v,ω,h,k+k) is joint off+|r~GP(h,r(k+k)) andr~IG(v,ω), which is the extended t-process regression model (eTPR).

2.3 Estimation

2.3.1 Estimation procedure

Denote the covariate byx=(x1,…,xn). The kernel function of eTPR modelfis

(11)

Let the input data be Dnand the new data point beu, the model can be written as

y(u)|Dn~

(12)

(13)

(14)

where

(15)

and

(16)

2.3.2 Computation

Consider the nested mappingF=f∘M. Log marginal likelihood of meTPR is

(17)

(18)

According to the chain rule, we can obtain the gradient-based estimation ofθMas follows,

(19)

Step 1. Set initial values of the parameters.

Step 4. Repeat Steps 2 and 3 until convergence.

3 Numerical study

This section includes two scenarios of stimulation study, i.e. the step function model and the smooth function model, and compares the performance of the proposed method with those of existing methods. We consider GPR, eTPR, mGPR, and meTPR to fit training data respectively, and obtain the prediction on testing data.

MSE (mean square error),

and PE (prediction error),

from each computed method, where {(xi*,yi*):i=1, …,N} are the test data. All simulation results are based on 100 replications.

3.1 Simulation

3.1.1 Step function

In the first scenario, we consider the following step function model:

y=F(x)+w,w~N(0,φ02)

(20)

where

Training data points with sample sizenare evenly sampled in [-5,4]. We take sample size asn= 20, 40 and 80, andφ0= 0.2 and 0.4. For testing data, 500 data points are generated at equal intervals from [-5,5]. An outlier is set at (4, 1.5). Let

M(x)=T(x)=t(Wx+B)

be the manifold transformation, wheret(x)=1/(1+e-x). Let the dimension of the feature space be 3,Wbe a 3×1 matrix, andBbe a 3×nmatrix. Matérn exponential kernel is used,

(21)

whereη1>0, Kα(·) is a modified Bessel function of orderα, and

(22)

and

(23)

Figure 1 shows prediction curves from GPR, eTPR, mGPR, and meTPR based on one simulation dataset. It follows that the meTPR prediction curve fits the indicator function better, compared to GPR and eTPR which ignore the manifold structure. mGPR and GPR are sensitive to outliers, while meTPR shows robustness against the outlier. It is reasonable that meTPR considers both of manifold structure and robustness against ouliers.

Figure 1. Presents prediction curves from GPR, eTPR, mGPR, and meTPR based on one simulation dataset.

Table 1 shows the mean and standard deviation of the MSE and PE of the predicted results. It shows that meTPR has the smallest MSEs and PEs among the four methods, mGPR is better than GPR and TPR, and eTPR has smaller MSE and PEs than GPR. When sample size becomes larger, MSEs and PEs reduce. It follows that for this non-smooth data, the accuracy of prediction can be improved by the manifold model mapping the input space to the feature space.

3.1.2 Smooth function with outliers

In the second scenario, we consider a smooth functionF(x),

F(x)=1/(1+e-3x)

(24)

Table 2.MSE, PE and their standard deviation (in parentheses) of GPR, TPR, mGPR and meTPR in Scenario 2.

Other steps are the same as those for the step function. Tabel 2 shows the mean and standard deviation of the MSE and PE of the predicted results. We obtain the similar conclusion with those for the step function.

3.2 Real data

The proposed meTPR model is applied to dataset for the study of children with Hemiplegic Cerebral Palsy, including 84 girls and 57 boys in primary and secondary schools. These students are divided into two groups (m=2): the group playing video games (56%) and the group not playing video games (44%). Average correct rate of Big/Little Circle (BLC) and the average correct rate of Choice Reaction Time (CRT) are measured. More details are in Reference [22]. Before applying the proposed methods, we take logarithm of BLC and CRT mean correct latencies. For GPR, eTPR, mGPR and meTPR, von Mises-inspired kernel was taken.

(25)

whereη0>0,η1>0.

We randomly selected 80%data as the training set and the remaining 20%data as the testing set for calculating the prediction errors under various models. The process is repeated 100 times.

Table 3 shows the mean and standard deviation of prediction errors. It shows that GPR has the largest average prediction error and meTPR has the smallest average prediction error. It follows that meTPR model performs well in improving the accuracy of prediction.

Table 3.Mean and standard deviation of prediction errors using GPR, eTPR, mGPR, and meTPR methods.

4 Robustness and information consistency

4.1 Robustness

LetT(Fn)=Tn(y1, …,yn) be an estimation ofθ, whereFnis the empirical distribution of {y1, …,yn}, andTis functional on the distributions. The influence function ofTonFis defined as

(26)

whereδyis 1 on pointy, 0 on other points. The influence function can show the degree of change of the estimated parameter after adding a disturbance to the data set, then it can reflect the robustness of the estimation method. For the meTPR model, we have the following proposition.

4.2 Information consistency

Letpφ0(y|F0,x) be the density function to generate the dataygivenxin true modely=F0(x)+, whereF0is the trueF. Letpθ(F) be a measurement of the random processFon space F={F(·): X→}. Let

(27)

by the Kullback-Leibler distance between two densitiesp1andp2. According to Seeger et al.[23], if

then we call meTPR model information consistent, which is presented in the following proposition.

Before presenting the information consistency of the meTPR, we briefly introduce a reproducing kernel Hilbert space[24]. Assume F is a Hilbert space of functionsF: X→with an inner product 〈·,· 〉F. We call F a reproducing kernel Hilbert space associated with a kernel functionX×X→satisfies

② ∀x∈X, ∀F∈F,

(28)

5 Conclusions

In order to solve the difficulty of fitting with outliers and in complicated covariate space, we proposed a manifold t-process regression(meTPR) model. We proposed a parameter estimation method and studied the theoretical properties of the model. The proposed model is robust to outliers, and performs well for non-smooth and complicated covariate space. AlthoughYis one-dimensional in this article, the model can be extended to multi-dimensional dependent functional data.