A Gaussian process regression-based sea surface temperature interpolation algorithm*

2021-07-29 09:11YongshunZHANGMiaoFENGWeiminZHANGHuizanWANGPinqiangWANG
Journal of Oceanology and Limnology 2021年4期

Yongshun ZHANG , Miao FENG , Weimin ZHANG ,2 , , Huizan WANG ,Pinqiang WANG

1 College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

2 Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Changsha 410073, China

Abstract The resolution of ocean reanalysis datasets is generally low because of the limited resolution of their associated numerical models. Low-resolution ocean reanalysis datasets are therefore usually interpolated to provide an initial or boundary field for higher-resolution regional ocean models. However,traditional interpolation methods (nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation) lack physical constraints and can generate significant errors at land-sea boundaries and around islands. In this paper, a machine learning method is used to design an interpolation algorithm based on Gaussian process regression. The method uses a multiscale kernel function to process two-dimensional space meteorological ocean processes and introduces multiscale physical feature information (sea surface wind stress, sea surface heat flux, and ocean current velocity). This greatly improves the spatial resolution of ocean features and the interpolation accuracy. The eff ectiveness of the algorithm was validated through interpolation experiments relating to sea surface temperature (SST). The root mean square error (RMSE)of the interpolation algorithm was 38.9%, 43.7%, and 62.4% lower than that of bilinear interpolation,bicubic interpolation, and nearest neighbor interpolation, respectively. The interpolation accuracy was also significantly better in off shore area and around islands. The algorithm has an acceptable runtime cost and good temporal and spatial generalizability.

Keyword: Gaussian process regression; sea surface temperature (SST); machine learning; kernel function;spatial interpolation

1 INTRODUCTION

Ocean reanalysis datasets can improve the accuracy of ocean studies by assimilating observational data,so they are widely used in studies of ocean thermal and dynamic processes and their spatiotemporal variability (Du and Qu, 2010; Kumar and Hu, 2012;Balmaseda et al., 2013). They can also provide initial conditions and side boundary conditions for model simulation and prediction. Research regarding the development of high-resolution ocean reanalysis datasets and associated applications is therefore very important. However, increasing the resolution of numerical models to improve ocean reanalysis datasets involves using huge quantities of computational resources, especially for global datasets. Low-resolution ocean reanalysis datasets are usually interpolated into higher-resolution regional ocean models using nearest neighbor interpolation,linear interpolation or bicubic interpolation (Nardelli et al., 2016). This has the advantage of a lower calculation cost and can be completed relatively quickly. Other common interpolation methods for data assimilation, such as optimal interpolation and the successive corrections method are also often used to interpolate ocean reanalysis datasets, but these methods require background field information (Wang et al., 2008, 2012). However, although traditional interpolation methods can improve the resolution of ocean reanalysis datasets, they cannot introduce multiscale physical feature information. This is because they do not take into account spatio-temporal changes and non-linear processes. They only use some calculation of adjacent values to interpolate. As a result, the products obtained through interpolation often contain errors (Li and Heap, 2008), especially with regard to coastal waters and the areas around islands (Sokolov and Rintoul, 1999).

In recent years, machine learning methods have been increasingly applied to the interpolation of geographic data. Li et al. (2011) used 23 methods,including random forest and support vector machines,to interpolate mud content samples from the southwestern edge of Australia and compared the diff erent results. The machine learning results were found to be the most accurate. Jia and Ma (2017)applied machine learning methods to the interpolation of seismic data, which can significantly reduce costs in engineering applications. In the field of meteorology,Antonić et al. (2001) used neural networks to interpolate meteorological data. Research has shown that neural networks can accurately simulate complex non-linear functions (Bryan and Adams, 2002),making them suitable for the processing of certain kinds of complex non-linear relationships in data.Appelhans et al. (2015) used multiple machine learning methods to interpolate monthly average temperatures and, in a quantitative evaluation, found the results to be better than the Kriging method. Based on a 10-fold cross-validation testing design, regression trees generally performed better than linear and nonlinear regression models.

Out of all the diff erent marine elements,temperature, especially sea surface temperature(SST), has the greatest impact on air-sea interactions(Thompson et al., 2017). In order to reduce the obvious errors of traditional interpolation methods near the off shore area and islands, this paper uses machine learning methods to interpolate SST. There are numerous diff erent machine learning methods.The kernel method can transform linear learning into non-linear learning by introducing a kernel function.This can map linearly inseparable problems in an original sample space to a higher-dimensional feature space where the linearly inseparable problem will be solved (Hofmann et al., 2008). Common machine learning methods for classification or regression that apply kernel functions include support vector machines (SVM), support vector regression (SVR),and gaussian process regression (GPR). In the field of meteorology, kernel methods have also performed well. Wang and Zhang (2005) used weighted least squares support vector machines (WLS-SVMs) to estimate wind speed and found this could accurately track wind speed trends and produce highly accurate estimations. Wang and Chaib-draa (2017) proposed a novel online Bayesian filtering framework for largescale GPR and applied it to global surface temperature analysis. The results showed that this approach was an effi cient and accurate expert system for global surface temperature analysis. Paniagua-Tineo et al.(2011) used SVR to accurately predict the maximum temperature over a period of 24 hours by introducing predictors such as temperature, precipitation, relative humidity, and barometric pressure. Based on statistical tests, the SVR performed better than a Multi-layer perceptron and an Extreme Learning Machine in this prediction problem.

GPR has a strictly theoretical basis for its approach to statistical learning and is adaptable to a range of complex problems, including high dimensionality,small sample size, and nonlinearity. It is also highly generalizable. In comparison to neural networks and support vector approaches, this method has the advantages of easy implementation, adaptive acquisition of hyperparameters, and the probabilistic significance ofits outputs (He et al., 2013). It is therefore widely used in the field ofimage superresolution (He and Siu, 2011). Improving the spatial resolution of SST through interpolation can actually be compared to image super-resolution. Both obtain higher resolution data from low resolution data. The Gaussian process defines a joint Gaussian distribution for any finite number of samples, which can be used to simulate a random sample distribution of sea surface temperatures.

Various factors can aff ect the SST (Katsaros et al.,2005), such as the sea surface wind stress, sea surface heat flux, and ocean current velocity. These physical factors have diff erent influence radii and intensities. A single kernel function cannot capture this multi-scale information. A combined kernel function therefore needs to be constructed to extract their diff erent influence radii and intensities. Huang et al. (2014)used SVM models to predict short-term wind speeds and introduced diff erent climate variables as input features to produce ideal prediction results. Grover et al. (2015) used wind direction, spatial distance,pressure, and temperature to infer long-term spatial dependencies through GPR.

Taking all of the above into account, this paper presents the design of an SST interpolation algorithm based on GPR. By constructing a combined kernel function, latitude and longitude, sea surface wind stress, sea surface heat flux, and ocean current velocity are used as input features to establish the regression relationship between these non-linear features and SST, which can eff ectively reduce the interpolation error near the off shore area and islands.

The remaining sections of the paper are organized as follows: Section 2 introduces the basic principles of GPR, commonly-used kernel functions, multiscale kernel functions for processing meteorological ocean processes and SST interpolation algorithms based on GPR. Section 3 presents the results of some SST interpolation experiments. The last section gives our overall conclusions.

2 GAUSSIAN PROCESS REGRESSION AND KERNEL FUNCTION

2.1 Gaussian process regression

A Gaussian process can be applied to a set of any finite random variables that obey a joint Gaussian distribution (Rasmussen and Williams, 2006). It can be fully represented by its mean function and covariance function, i.e.:

f(x)~GP(m(x),k(x,x′)),

where, GP is an abbreviation for Gaussian process,xandx′are arbitrary random variables, and

m(x)=E(x),

k(x,x′)=E[(f(x)-m(x))(f(x′)-m(x′))].

For the purposes of simplification, the mean function is typically taken to be 0 (Rasmussen and Williams, 2006).

Let us suppose there is a training set,D={(xi,yi)|i=1, 2, ···,n}=(X,y), where,x∈Rdis ad-dimensional input vector, andX={x1,x2, ···,xn} is ad×n-dimensional input matrix.yi∈Ris the corresponding output scalar andyis the output vector.Ris a real number field, andRdis a d-dimensional real number space.

Assuming the training set is noisy, the following model can be used:

y=f(x)+ε.

Assuming thaty*is the predicted value corresponding to the test point,x*, the joint Gaussian distribution ofyandy*can be obtained as follows:

If there arentraining points andn*test points, thenK(X,x*) represents then×n*order covariance matrix used to measure the correlation betweenxandx*. The same is true forK(X,X),K(x*,X), andk(x*,x*).Inis an n-dimensional identity matrix,xis the input vector,andNis the distribution.

GPR is a non-parametric algorithm based on Bayesian framework, the Bayesian posterior distribution ofy*is:

The mean and variance of the predicted values of corresponding to are:

GPR obtains the entire distribution at the point to be predicted. The mean is then usually chosen as the best regression value.

2.2 Kernel function

The key feature of Gaussian process regression is that the covariance matrix,K, of the joint Gaussian distribution has to be a symmetric semi-definite matrix. In the kernel method, the kernel functions are all symmetric and semi-definite, so, theoretically the kernel functions used in machine learning can be used as covariance matrices. The squared exponential (SE)covariance function is the most commonly used kernel function. It can be expressed as follows:

where,lis the length scale,r=x-x′,xandx′are the input vectors in the test set and training set,respectively.

The SE covariance function is infinitely diff erentiable, so, a Gaussian process with this covariance function has mean square derivatives of all orders, making it very smooth and only able to handle single features. This means it is not suitable for modeling complex physical processes (Rasmussen and Williams, 2006).

The rational quadratic (RQ) kernel function diff ers from the SE covariance function in its ability to handle features with diff erent length scales. It can deliver very fine simulations of various abnormal sea surface temperatures along coasts and near islands.The function can be expressed as follows:

when the hyperparameterslandαare greater than 0,it can be regarded as an infinite scale mixture of the SE covariance function with diff erent length scale features (Rasmussen and Williams, 2006). Thus, it amounts to the sum of multiple kernel functions.

Unlike the smooth characteristics of the SE covariance function, the Matérn class kernel function’s characteristics are rough. Its general expression is as follows:

where,lis a length scale hyperparameter;Kvis a modified Bessel function; andΓis a gamma function.v=p+1/2 (pis a non-negative integer, such as 1, 2, 3,etc). The smaller the value ofv, the better the function can handle complex nonlinear relationships. This makes it more suitable for handling meteorological and oceanic processes. Commonly,v=1/2, 3/2, or 5/2.

Ifv=1/2, the kernel function can be simplified as follows:

It has been proven that the sum, product, and scale of existing kernel functions are still kernel functions(Rasmussen and Williams, 2006). Multi-scale information can be extracted by combining kernel functions, with the appropriate kernel functions being selected and combined according to a combination rule to extract the characteristic information for diff erent physical processes in a targeted fashion. The length-scale hyperparameterlrelates to the influence radius of each feature.

2.3 GPR-based interpolation algorithm

SST is the result of a combination of thermal,dynamic oceanic processes and air-sea interactions.Due to the complexity and randomness ofits influencing factors, the latitude and longitude distribution of SST is uneven. As a result of the influence of the sea surface wind and sea surface heat flux, SST may also be subject to abnormal local phenomena in some areas. Furthermore, it can also be aff ected by coastal runoffin off shore areas and around islands. At the same time, SST is not entirely chaotic with time. Generally speaking, changes in SST adhere to a significant annual cycle and are seasonally aff ected (Du et al., 2003).

As similarly spatially located samples have similar distributions, the geographic location (longitude,latitude) can be used as an initial input feature. In view of the physical causes and influencing factors of SST, zonal sea surface wind stress, meridional sea surface wind stress, sea surface heat flux, zonal current velocity, and meridional current velocity can be selected as additional input characteristics.

The Matérn class kernel function is especially well-suited to extraction of the rough feature distribution of SST. The rougher the function, the more detailed the changes that can be extracted. As noted above, the RQ kernel function can be regarded as the sum of numerous SE covariance functions with diff erent length scale features. This function can simultaneously fit a wide range of uniform temperature distributions across the ocean surface and abnormal local temperature changes, including in coastal areas and around islands.

In view of the various factors that can influence SST, a combined kernel function,ks, was constructed to describe the distribution characteristics of SST:

ks(·,·)=km(·,·)+kr(·,·),

km(·,·), here, represents a Matérn class kernel function whenv=1/2.kr(·,·) represents an RQ kernel function.Matérn class kernel functions can describe complex non-linear features that aff ect SST, while the RQ kernel function can describe the distribution characteristics of SST from diff erent shores to distant seas at diff erent scales.

Specifically,

The hyperparametersαandlin the kernel function are both unknown and can be set to 0 until derived from a marginal likelihood logarithm.

The training and prediction process of the interpolation algorithm is shown in Fig.1.

3 A CASE STUDY OF SST

Fig.1 Algorithm process

Fig.2 Experimental process

To assess the validity of the proposed approach, we used the Global Ocean Reanalysis Dataset SODA(Simple Ocean Date Assimilation) developed by the University of Maryland and Texas A & M University.The spatial resolution of this product is 0.5°×0.5°; the latitudinal and longitudinal range is 0.25°E-180°-0.25°W, 74.75°S-89.75°N; and the layers are unequally spaced in a vertical direction, with there being a total of 50 layers. In this paper, the temperature to a depth of 5 m was selected as the SST.

The experiment used monthly data from 2014 to 2015. The study area was 0°-66°N, 100°E-180°. The training set had a resolution of 1°×1° and was sampled from the original data set for the 12 months in 2014.The remaining data was used as the validation set.This selection was chosen to train and obtain hyperparameters from the data of a single month and to explore the eff ect of monthly and seasonal changes in sea surface temperature on the interpolation eff ect.Out of these 12 sets of training and validation experiments, the best set of hyperparameters was selected. The data from the South China Sea in May 2015 was then selected as a test set. The experimental process is shown in Fig.2.

It should also be noted that, because the focus was the sea surface temperature, invalid values in the land area had to be eliminated. The input variables were the longitude, latitude, zonal sea surface wind stress,meridional sea surface wind stress, sea surface heat flux, zonal current velocity at 5 m, and meridional current velocity at 5 m. The control experiments were bilinear interpolation, bicubic interpolation, nearest neighbor interpolation, Support Vector Regression and Principal Component Regression (hereinafter referred to as Bilinear, Cubic, Nearest, SVR and PCR).

The root mean square error (RMSE) was used to evaluate the accuracy of the interpolation, which can be defined as follows:

in which, originaliis thei-th SST in the original data;predictediis thei-th predicted value; andnis the number ofinterpolation points. The RMSE represents the average deviation between the predicted and original values.

As noted above, the SST interpolation results can also easily be described in image form. The structural similarity index (SSIM) is a common indicator for measuring the similarity of two images, so the SSIM can also be used to evaluate the interpolation results.Assuming the two input images arexandy, the SSIM can be defined as:

wherel(x,y) is the brightness comparison;c(x,y) is the contrast comparison; ands(x,y) is the structural comparison.μxandμyrepresent the average ofxandy;σxandσyrepresent the standard deviation; andσxyrepresents the covariance ofxandy.c1,c2, andc3are constants to avoid system errors caused by there being a denominator of 0. Generally, we setα=β=γ=1,c1=6.502 5,c2=58.522 5, andc3=c2/ 2 (Wang et al., 2004).

The SSIM range is 0 to 1. The more similar the two images, the greater its value. When the two images are exactly the same, the SSIM value is equal to 1.

3.1 Results of the single test set

Fig.3 Diff erence between the original SST and the result of a diff erent interpolation

Through the experiments on the validation set,based on the results of the RMSE and SSIM evaluations, it was established that the parameters generated from the training set for September 2014 provided the optimal model, the parameters generated by machine learning include hyperparameters in the mean function, covariance function, and likelihood function. This model was therefore used to predict the test set. Excluding the land area, the test set had 1 397 eff ective interpolation points. The test area was the South China Sea, as specified above, which will hereinafter be referred to as “Region 1”.

It can be immediately seen in Fig.3 that the interpolation eff ect using GPR was better than the Bilinear, Cubic, Nearest, and PCR interpolation methods in most areas, especially at the land-sea boundaries. As a kernel machine learning method, the SVR interpolation method had similar advantages as GPR, but it was still slightly worse than the interpolation eff ect of GPR. The comparative RMSE results in Table 1 show that the RMSE obtained by the GPR interpolation was 62.4% lower than Nearest interpolation, 43.7% lower than Cubic interpolation,and 38.9% lower than Bilinear interpolation.

It is noted also that, if the result of the interpolation is exactly the same as the true value, you can draw a graph where all the diff erences are 0 (i.e., green). Ifthis figure is used as a reference, the SSIM values between the diff erence map and the reference map generated by the above six methods can be calculated.The diff erence between the results obtained by the six interpolation methods and the original image can then be measured, as is also shown in Table 1. It was found that the value of the GPR interpolation was still the closest to 1. This shows that, even if the superresolution accuracy is measured from an image perspective, the GPR interpolation is still better than the other types ofinterpolation.

Table 1 The RMSE and SSIM results of the Nearest, Cubic,GPR, PCR, Bilinear, and SVR interpolation methods for Region 1

3.2 The temporal and spatial generalizability of the interpolation algorithm

The generalizability of a model learned through machine learning refers to the extent to which it can be applied to a new sample. For a spatiotemporally continuous geographic data interpolation algorithm,such as SST, its generalizability needs to be considered in temporal and spatial terms.

Fig.4 The SSTs interpolated by diff erent methods for 2015 in Region 1

As the hyperparameters of the model were generated by training on a single month of data, it is necessary to examine the algorithm’s temporal generalizability. It can be seen from Fig.4 that the results for the Nearest interpolation method were the worst across all 12 months. The GPR interpolation method produced significantly better interpolation results than the other five from March to October,with the interpolation results in September and October being the best. This may be because the hyperparameters of the best model selected in the validation set were for September 2014 and SST has a regular interannual change. The interpolation eff ect of GPR for each month was consistently better than the other five methods, indicating that the algorithm’s temporal generalizability is more reliable than other methods.

The above experiments were all performed in Region 1. We will now examine the algorithm’s spatial generalizability. The test time was unified to May 2015 and two other regions were selected to compare the interpolation results. The selected areas were 0°-30°N, 125°E-150°E and 30°N-65°N,115°E-150°E (hereinafter referred to as Region 2 and Region 3). Region 2 was just ocean. Region 3 was the sea around both land and an island.

Fig.5 The diff erence between the original SST and those of diff erent interpolations for 2015 in Region 2

Fig.6 The diff erence between the original SST and those of diff erent interpolations for 2015 in Region 3

Table 2 The RMSE and SSIM results of the Nearest, Cubic,GPR, PCR, Bilinear, and SVR interpolation methods for Region 2

Combining the results shown in Fig.5 and Table 2,it can be seen that, for the ocean far from land, the GPR interpolation eff ect is very good, with the RMSE being nearly 70.6% lower than the Bilinear interpolation, 76.0% lower than Cubic interpolation and 81.9% lower than Nearest interpolation. It can also be seen in Fig.5 that the GPR method was significantly better than the traditional method in the northwestern part of Region 2. This is because this area is located in the south of the island, and the interpolation of GPR method near the island can perform better. Similarly, machine learning methods including SVR and PCR interpolation method also had better interpolation eff ects in this area than traditional methods.

Combining the results in Fig.6 and Table 3, it can be seen that, for the ocean near the island, the GPR interpolation eff ect is still good, with the RMSE being nearly 34.4% lower than the Bilinear interpolation,42.4% lower than Cubic interpolation and 62.7%lower than Nearest interpolation. However, it can be seen that, in addition to the two kernel machine learning methods, other interpolation methods are less eff ective near the island.

Table 3 The RMSE and SSIM results of the Nearest, Cubic,GPR, PCR, Bilinear, and SVR interpolation methods for Region 3

In summary, the following conclusions can be drawn:

1. The SST interpolation results for the ocean near land and islands is not as good as it is for areas far from land.

2. Overall, GPR interpolation performs better than traditional interpolation (Bilinear, Cubic, and Nearest). The GPR interpolation results are very good, whether the ocean is far from land and islands or at their boundaries.

3. The temporal and spatial generalizability of the GPR interpolation algorithm is reliable.

Fig.7 Using the original GPR interpolation methods and the GPR interpolation method for generating kernel functions in similar months to interpolate the SST for 2015 in Region 1

3.3 The eff ect of seasonal changes on the algorithm

As the best set of hyperparameters and the corresponding kernel functions selected by the validation set were generated by the training set for September 2014, the temporal generalizability of the interpolation algorithm was found to be best for the test sets for September and October 2015. As the SST is subject to significant change over a 12 month period,we decided to explore the eff ect of using kernel functions generated by training in similar months.

The experimental setup was as follows. The GPR algorithms generated in February, May, August, and November 2014 were selected to test the data from January to March, April to June, July to September,and October to December 2015. The selected test area was still the South China Sea. The test results were compared with the original interpolation algorithm,the Bilinear interpolation and the Cubic interpolation results.

As can be seen in Fig.7, the GPRn interpolation performed worse than the original GPR interpolation from March to October, with the interpolation results for the other four months being slightly better. It is therefore worth considering using GPR for interpolation from March to October and GPRn for the remaining four months. Overall, both interpolation eff ects outperformed the Bilinear interpolation and the Cubic interpolation. However, the results of these tests suggest that the eff ect of seasonal changes on the GPR interpolation algorithm were not obvious.

Fig.8 Comparison of the time consumption for the six diff erent interpolation methods for the 12 months of 2015 in Region 1

3.4 Discussion of the algorithm runtime

It can be seen from Fig.8 that the Bilinear interpolation method, the Cubic interpolation method,and the Nearest interpolation method consume almost the same amount of time. When there were 1 397 eff ective interpolation points, this was about 20-21 s,with the GPR interpolation method, PCR interpolation method, and SVR interpolation method taking about 22-23 s. Thus, the GPR interpolation method has an additional time cost of about 2 s (10%). In terms of the improved interpolation accuracy, this cost is completely acceptable.

4 CONCLUSION

Improving the spatial resolution of ocean reanalysis datasets is very important for the study of meso-scale and small-scale ocean processes and sea-air interactions. It can also provide initial conditions and side boundary conditions for high-resolution regional models. To reduce the errors introduced by traditional interpolation methods that only use local neighborhood sample points for interpolation, this paper has sought to introduce physical factors such as sea surface wind stress, sea surface heat flux, ocean current velocity,and SST into the design of an interpolation algorithm based on GPR. Here, the GPR interpolation method has focused on the SST in SODA reanalysis products.The results show that this method can significantly reduce the interpolation error. Compared with the nearest neighbor interpolation, bicubic interpolation,and bilinear interpolation methods, it had an RMSE that was 62.4%, 43.7%, and 38.9% lower, respectively.The improvement in the interpolation accuracy was especially obvious for coastal waters and the areas around islands. The algorithm showed good temporal and spatial generalizability. The hyperparameters obtained from the training set data for September 2014 were the best. This set of parameters was able to generate the best interpolation results for other months as well. The model obtained in the training area can also be applied to other regions in the Western Pacific.The South China Sea is aff ected by monsoons and the circulation is subject to seasonal changes. We therefore also tested the eff ect of seasonal changes on the algorithm. The results showed that the eff ect of seasonal changes on the algorithm is not significant.

The study reported here has several shortcomings.There are a number of other factors that aff ect SST, not all of which we have introduced into the kernel functions. Only the SST in ocean reanalysis datasets was considered. In the future, the interpolation model needs to be extended and interpolation algorithms for sea surface salinity and sea surface height will need to be developed to improve the spatial resolution of ocean reanalysis datasets in a more comprehensive way. The interpolation algorithm can also be extended below the sea surface to interpolate ocean elements such as temperature and salinity at various depths,thus enabling high-resolution display of threedimensional oceanic processes. This would facilitate a more accurate description of dynamic ocean processes and the development of more accurate models to ascertain initial field and boundary conditions.

5 DATA AVAILABILITY STATEMENT

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

6 ACKNOWLEDGMENT

We are grateful to the University of Maryland and Texas A & M University for developing, maintaining and making available (https://www.atmos.umd.edu/~ocean/) the SODA datasets that were used in this study.