Investigation of feature contribution to shield tunneling-induced settlement using Shapley additive explanations method

2022-08-24 10:01PodhKnnngrWnhunZhouZhiDingZhehoHong

K.K. Podh M. Knnngr, Wnhun Zhou,*, Zhi Ding, Zheho Hong

a State Key Laboratory of Internet of Things for Smart City and Department of Civil and Environmental Engineering, University of Macau, Macau, China

b Department of Civil and Environmental Engineering, School of Engineering, Zhejiang University City College, Hangzhou, China

Keywords:feature Selection Shield operational parameters Pearson correlation method Boruta algorithm Shapley additive explanations (SHAP)analysis

ABSTRACT Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters. Recent studies reveal that machine learning (ML) algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable. Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method, sequential forward selection (SFS), sequential backward selection (SBS) and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement (Smax). The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.Smax).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases. The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry, geological conditions and shield operation. The recently proposed Shapley additive explanations (SHAP) method explores how the input features contribute to the output of a complex ML model. It is observed that the larger settlements are induced during shield tunneling in silty clay. Moreover, the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output.

1. Introduction

Underground metro systems are constructed to ease metropolitan area traffic demands (Zhang et al., 2020a). These systems are usually built as twin tunnels and excavated through the shallow soft soils using a shield tunneling method (e.g. the earth pressure balance(EPB)shield).Although shield tunneling is an efficient and rapid tunnel construction technique (Wang et al., 2018), ground settlements are inevitable regardless of tunneling method(Neaupane and Adhikari, 2006). Excessive ground settlements can pose a significant risk to existing structures (e.g. buildings, foundations and tunnels) and service lines. Therefore, it is crucial to identify and understand the key factors that contribute to tunneling-induced ground settlement.Such information will assist engineers in making the appropriate adjustments to control the surface settlement induced during tunneling.

Conventionally, surface settlement due to tunneling has been analyzed using analytical, empirical, experimental or numerical methods. Analytical studies usually adopt assumptions and complex mathematical theories when analyzing tunneling-induced settlements (e.g. Loganathan and Poulos,1998; Bobet, 2001; Chou and Bobet, 2002; Park, 2005; Zhang et al., 2020b). Due to the simplicity, empirical models (e.g.Peck,1969; Attewell and Farmer,1974; Atkinson and Potts, 1977; O’Reilly and New, 1982; Attewell et al.,1986; Mair et al.,1993) are more popular instead. However,analytical or empirical models often ignore the shield operational parameters,which are used to control the settlement during shield tunneling. Alternatively, experimental models can be designed to simulate actual physical conditions during tunneling(e.g.Ng et al.,2013; Fang et al., 2019; Hu et al., 2019). However, the implementation of an experimental model can be time-consuming and costly.On the other hand,models based on numerical methods can be applied to simulate more complex scenarios and are able to produce better visual output (e.g. Addenbrooke and Potts, 2001;Chakeri et al., 2011; Ercelebi et al., 2011; Lambrughi et al., 2012;Gong et al.,2014;Zhang et al.,2020a).Nevertheless,the accuracy of a numerical model will greatly depend on the choice of constitutive soil model and its input parameters, which can be difficult to obtain. Further, it is challenging to model the entire mechanized tunneling process with multiple operational parameters even for an experienced user who possesses the necessary theoretical knowledge and software skills.

With the development of computer technology,machine learning(ML)has become a handy method to solve problems that require one to map multiple inputs to a desired output.Recent studies have shown great potential in applying ML methods to analyze complex problems,such as landslide-induced deformations (e.g. Zhang et al., 2020c;2021a)and underground soil-structure interaction problems due to tunneling (e.g. Zhang et al., 2020d; Jong et al., 2021). In early investigations, artificial neural networks (ANNs) were employed to predict tunneling-induced settlement over tunnels excavated using the new Austrian tunneling method(NATM)(e.g.Shi et al.,1998;Kim et al.,2001).Later,Suwansawat and Einstein(2006)utilized ANNs to determine the influence of shield operational parameters and surface settlement data monitored during twin tunnels excavated in Bangkok,Thailand,using EPB shields.The authors considered 10 input features,which were grouped into three categories, i.e. tunnel geometry,geological conditions and shield operational parameters.Their analysis provided critical evidence to show the importance of considering the shield operational parameters and shield model type when predicting surface settlement due to shield tunneling.Since then,several other studies have applied the ANN algorithm to predict settlement induced during tunneling(e.g.Santos Jr.and Celestino,2008;Boubou et al., 2010; Darabi et al., 2012; Koukoutas and Sofianos, 2015;Mohammadi et al., 2015; Chen et al., 2019a). Recently, Zhang et al.(2020e) proposed a hybrid model by coupling ANN with a differential evolution algorithm to predict ground settlement induced by shield tunneling.However,the authors did not consider the influence of tunnel geometry (e.g. cover depth and tunnel diameter) in their analysis.Nevertheless,one of the major challenges in employing ANN is determining the optimal network architecture (Goh et al., 2018).Moreover,dueto its complexity,the output of anANNmodel is usually uninterpretable; therefore, a complex ML model like ANN is commonly referred to as a“black-box”model.

Prediction of tunneling-induced settlement has also been achieved by various ML algorithms apart from ANN.For example,Goh et al. (2018) proposed an ML model based on a multivariate adaptive regression spline (MARS) algorithm to predict the maximum surface settlement (Smax) due to EPB tunneling. The authors used 148 observations extracted from three mass rapid transit tunnel projects carried out in Singapore to train and test the MARS model.Zhang et al. (2021b) utilized the same data set to compare the performance of four ML models built on ANNs, MARS, support vector machine (SVM) and extreme gradient boosting (XGBoost)algorithms to predict tunneling-induced settlement caused by EPB shield tunneling. According to their comparison, the XGBoost model outperformed the other three models in predicting Smax.The random forest (RF) is another popular ML algorithm, which has been successfully applied in several recent studies to predict maximum ground settlement due to tunnel excavations (e.g.Kohestani et al., 2017; Zhou et al., 2017; Chen et al., 2019b; Zhang et al., 2019, 2020f). Moreover, a critical evaluation conducted by Zhang et al.(2020g)revealed that the RF algorithm is better suited over the long short-term memory (LSTM) algorithm, which is a deep learning approach,to predict Smaxduring tunneling.However,most complex ML models offer only the final output with no information related to the relationship between input and output.An interpretable ML model,which can provide valuable insights about the relationship between input and output parameters,will aid the shield operator for making appropriate decisions to control the settlement in the desired limit throughout the tunneling process.

Feature selection is a key step when developing ML models because the performance of the ML models can directly be affected by unnecessary features(Zhang,2019).Also,the interpretability of an ML model can be enhanced when fewer features involve.Nevertheless, most ML-based studies conducted to predict tunneling-induced settlement have skipped this step by choosing the features either based on experience or existing studies. The influential features may vary from project to project; therefore,applying a feature selection method will help to identify the important features for a given project.Methods to conduct feature selection can be divided mainly into two groups:filter and wrapper.The filter methods are often relied on statistical techniques and can be employed independently without requirement of an ML model.For regression problems, correlation coefficient can be used as a filter to select the necessary features (Jovi′c et al., 2015). For instance, the Pearson correlation method has been employed to identify the essential parameters affecting surface settlement due to tunneling(e.g.Zhang,2019;Zhang et al.,2021b).Although it is a simple method to execute, it can only consider the linear relationship between two parameters and ignore the effect of feature interaction. Besides, this method does not have a mechanism to select or reject the features,which can be problematic,especially if no strong correlations between the parameters exist. Unlike the filter method, algorithms based on wrapper methods (e.g.sequential forward selection (SFS), sequential backward selection(SBS) and Boruta) are aimed to optimize the ML model’s performance by finding an optimal feature set. The Boruta algorithm(Kursa and Rudnicki, 2010) has recently been used in various studies to conduct feature selection and demonstrated promising outcomes(e.g.Al-Qerem,2020;Sharmeen et al.,2020;Effrosynidis and Arampatzis,2021;Gholami et al.,2021).The Boruta algorithm considers the feature interactions and allocates a rank for each feature to easily distinguish all-relevant features for analysis.

Because the tunneling-induced settlement is a complex and nonlinear problem,simple models(e.g.multiple linear regression)often do not work well (e.g. Mahmoodzadeh et al., 2020). In contrast,complex ML models(e.g.ANNs,XGBoost and RF)can give higher accuracy while losing interpretability. A novel field, known as explainable artificial intelligence (XAI), has emerged to allow humans to understand complex ML models’ output (Linardatos et al., 2021). The Shapley additive explanations (SHAP), as proposed by Lundberg and Lee (2017), is one such XAI-based algorithm. The SHAP algorithm can be used to understand how each feature contributes toward a specific output from an ML model and has recently been adopted in many areas, including structural engineering (e.g. Mangalathu et al., 2020, 2021; Parsa et al., 2020;Degtyarev,2021).However,there is limited knowledge on how the input features contribute to the output of an ML-based model considered in a geotechnical analysis.

Initially, this study applies four feature selection methods (i.e.Pearson correlation, SFS, SBS and Boruta algorithm) on a data set compiled from monitored data during two EPB tunneling projects completed in Hangzhou, China. Then, four RF models will be built and trained by inputting the features obtained from the respective feature selection method. Subsequently, the prediction performance of the four RF models will be evaluated to select the bestperforming model. Finally, the SHAP algorithm will be implemented to understand the contribution of the input features based on the outcome of the best-performing RF model.

2. Methodology

2.1. RF

The RF is an ensemble learning method proposed by Breiman(2001). The base estimator of the RF is a decision tree (DT), and a forest is created by combining multiple DTs,as shown in Fig.1. It is based on two robust procedures,i.e.bagging(bootstrap aggregating)(Breiman,1996) and random subspace method (Ho,1998). Presume that the training data set consists of N number of observations(number of rows in the data set),M number of features (number of columns in the data set), and an RF that contains k number of DTs(models).Bagging requires k randomly chosen bootstrap data sets to train the respective DT in the forest and aggregate the trained models to obtain the final model.A bootstrap data set is created by conducting random sampling with-replacement (i.e. duplicate observations are allowed) from the training set. In this study, an RF regressor available in the sci-kit learn library was utilized for analysis.

The procedure for constructing RF regression algorithm is as follows:

(1) Create a bootstrap sample from the training set(In the sci-kit learn, the number of samples to draw from the input set to create a bootstrap sample is controlled by adjusting a hyperparameter called max_samples. The default value for max_smaples is N, i.e. the total number of observations in training set; the user can set any value arbitrarily);

(2) Train a DT using the random subspace method, selecting a subset of features out of M possible features;

(3) Repeat Steps 1 and 2 until the entire forest is trained (k-1 time); and

(4) Estimate the final prediction by averaging the predictions of trained models in the forest by

Fig.1. Schematic diagram of the process of building an RF algorithm.

where ypredis the average prediction for the forest, k is the number of DTs (i.e. models) in the forest, and yiis the prediction of an individual tree for a desired observation.

The RF algorithm has been successfully implemented for predicting tunneling-induced settlements(e.g.Zhou et al.,2017;Zhang et al., 2019, 2020f, g). However, the interpretability of the RF is significantly low when compared with DT.

2.2. Feature selection methods

Feature selection is the process of manually or automatically choosing the input features that contribute significantly to the target variable. It is a desirable step to consider when building an ML model, as less features will enhance the training time, interpretability, and model performance. Apart from built-in methods(a.k.a. embedded), most prevailing supervised feature selection methods can be categorized mainly into filter or wrapper methods(John et al.,1994).The well-known Pearson correlation method falls under the filter method. Under the wrapper methods, the sequential feature selection and Boruta algorithms can be considered.Unlike the filter methods,wrapper methods rely on a training algorithm,which is the major difference between the two methods,as shown in Fig. 2. Filter methods are generally computationally efficient than wrapper methods;however,the selection criterion is less related to the model’s effectiveness(Kuhn and Johnson,2013).This study applies the Pearson correlation, SFS, SBS and Boruta algorithm for conducting feature selection. A brief introduction to each of these techniques is given in the following sections.

2.2.1. Pearson correlation method

The Pearson correlation assesses the linear association between x and y variables(Pearson,1895).Eq.(2)can be used to estimate the Pearson correlation coefficient (r).

The value of the r ranges from -1 (strong negative correlation)to+1(strong positive correlation).Zhang et al.(2021b)utilized r to examine the relationship between different input features and Smax. The guideline they adopted to evaluate the strength of the correlation is as follows: |r| < 0.19 reflects very weak correlation,0.2 < |r| < 0.39 reflects weak correlation, 0.4 < |r| < 0.59 reflects moderate correlation, 0.6 < |r| < 0.79 reflects strong correlation,and |r| > 0.8 reflects very strong correlation.

2.2.2. Sequential feature selection algorithms

The sequential feature selection algorithms are based on the wrapper approach, whose main goal is to find a d-dimensional feature subspace from an m-dimensional feature space where d

2.2.3. Boruta algorithm

The Boruta is an all-relevant feature selection method proposed by Kursa and Rudnicki (2010). The Boruta algorithm is wrapped around the RF algorithm but works well with other ML models,e.g.XGBoost, SVM and logistic regression (Lee, 2020). The steps considered in the Boruta algorithm are as follows:

(1) Create a shadow (i.e. duplicate) of each feature (column) in the original data set X for training;

(2) Permute (i.e. randomly shuffled) the data in the shadow features to remove the feature correlation;

(3) Create a new data set (X_boruta) by attaching the shadow features, which contain twice the number of features (columns) of X;

(4) Fit an RF algorithm using the extended data set (X_boruta)and y_train;

(5) Estimate the importance of all the features in X_boruta;

(6) Obtain the maximum feature importance among the shadow features (i.e. threshold);

(7) Assign a“hit”whenever the importance of an original feature exceeds the threshold; and

(8) Repeat the above steps for a predefined number of iterations(max_iter)and count the number of hits each original feature obtains.

Because each independent trial can only give a binary outcome(i.e. hit or no hit), a series of iterations follows a binomial distribution.The Boruta algorithm divides the binomial distribution into three areas, i.e. area of refusal (left tail), area of acceptance (right tail),and area of irresolution(Mazzanti,2020).Features ending up in the left tail are considered as noise;hence,they are dropped.On the other hand, features ending up in the right tail are selected as critical. The critical values for defining the tails of the distribution are determined using a significance level of 5% (i.e. α = 0.05). It should be noted that the Boruta algorithm is indecisive about the features ending up in between the tails of the distribution(i.e.area of irresolution),and those features can be chosen depending on the case.

2.3. SHAP

The SHAP is a Python-based package proposed by Lundberg and Lee(2017)to explain the outcome of an ML model by evaluating the contribution of each feature to the model’s output. This method assesses the local feature importance by adopting the Shapley values (Shapley,1953) defined in game theory. When two properties define fairness as additivity (i.e. all the individual feature contributions sum up to the final value)and consistency(i.e.more contribution - not less important), Shapley values represent the only fair way of feature contribution. The Shapley values for a particular feature i (φi) can be calculated by taking the average of the marginal contribution (model’s output with and without the feature i) of feature i computed for all subsets S of N, excluding feature i, as given by

where f is the original prediction model,S is a subset of features,N is the set of all features, i is the specific feature, and n is the total number of features.

In SHAP,output of the prediction model for a single observation x(i.e.f(x))is explained by a linear function g and can be expressed as follows:

where x is the instance being explained, x′is the simplified input,and they are related through a mapping function x = hx(x′); φ0is the base value when all the inputs are missing;and M is the number of simplified input features.

SHAP values can be approximated by various methods, e.g.kernel SHAP,deep SHAP,and tree SHAP.Among these methods,tree SHAP,a version of SHAP for tree-based ML models(e.g. DT, RF and XGBoost), is used in this study. Two unique advantages of SHAP values are its global and local interpretability. Contrary to the existing feature importance attribute in ML models, SHAP can identify whether the contribution of each input feature is positive or negative.It should be noted that all the analysis works presented in this study are implemented in Python 3 programming language.

3. Formulation of the data set

3.1. Project details

All data used in this study were compiled from two metro tunneling projects in Hangzhou,China.As shown in Fig.3,Project 1(from Gucui Road station to Xueyuan Road station) was excavated in Hangzhou metro line 2, while the excavation from Shuangpu station to Heshan Road station in Hangzhou metro line 6 was considered as Project 2.

Fig. 2. Feature selection methods: (a) Filter method and (b) Wrapper method.

Fig. 3. Map of Hangzhou metro system (Schwandl, 2007).

Fig.4 explains the excavation schedule for Projects 1 and 2.The twin tunnels excavated for Project 1 (denoted as downlink and uplink in Fig. 4a) was initiated in January 2016 and completed by June 2016. On the other hand, the twin tunnels considered in Project 2(denoted as left tunnel and right tunnel in Fig.4b)started in April 2017 and was ended by October 2017. The downlink of Project 1 and both tunnels in Project 2 were excavated using“Ishikawa”EPB shields.In contrast, the“Kawasaki”EPB shield was used to excavate the uplink of Project 1. The twin tunnels in both projects are circular, with inner and outer diameters of 5.5 m and 6.2 m,respectively.The total excavation length of the twin tunnels for the Project 1 was 1950 m and that for Project 2 was 2486 m.Note that this analysis only considered the data from the first excavation of each project(i.e.downlink in Project 1 and left tunnel in Project 2).

Fig. 4. Twin tunnels excavation schedule for (a) Project 1 and (b) Project 2.

3.2. Geological details

Fig.5 illustrates the cross-sectional geological profiles observed in Projects 1 and 2. The downlink, which is the first excavation of Project 1, as shown in Fig. 5a, was mainly excavated through the muddy silty clay and muddy clay with silt layers.On the other hand,the first excavation of Project 2 (i.e. left line) was mainly passing through the layers of sandy silt and silty sand,as shown in Fig.5b.The cover depths ranged from 10.6 m to 18.7 m for Project 1 and from 9 m to 16.6 m for Project 2. Table 1 lists the soil properties encountered in Projects 1 and 2, which were obtained from the laboratory tests conducted by following the Chinese National Standard GB/T50123-1999.For example,shear strength parameters(c and φ)were obtained by conducting a series of direct shear tests.The soil samples used in these tests were pre-consolidated for 24 h first and then sheared rapidly with a rate of 0.8-1.2 mm/min under undrained conditions. The average groundwater levels were at 2.14 m and 1.8 m below the ground surface for Projects 1 and 2,respectively. It should be noted that groundwater levels remained steady during the excavation process.

Fig. 5. Cross-sectional geological profiles for (a) Project 1 and (b) Project 2 (unit: m).

Table 1 Soil properties observed in Projects 1 and 2.

3.3. Description of the data set

In ML terminology,independent variables are usually known as features,and the output variable is known as the target.According to previous studies,various features can be influential for inducing Smaxduring shield tunneling. Usually, these features are categorized into three groups:tunnel geometry,geological conditions and shield operational parameters (Suwansawat and Einstein, 2006).Table 2 lists the 14 input features by their respective category and the target variable (i.e. Smax) considered for the analysis. The surface settlement measurements were obtained using an optical level(Suguang DS05, China) and an electronic level (Trimble DINI 03,USA)with a precision of 0.5 mm/km and 0.3 mm/km,respectively.The corresponding measurements were recorded twice a day during one time in the morning (8:00 a.m.) and another time in the evening (4:00 p.m.).

Table 2 Details of input features and target variable considered for analysis.

The cover depth (H) and diameter of the tunnel (D) are wellknown parameters representing tunnel geometry. For example,Morovatdar et al.(2020)showed that tunnel diameter significantly influences settlement during tunnel excavation using the NATM.However, the tunnels considered in this study have an identical(outer) diameter of 6.2 m due to shield tunneling. Therefore,parameter D is excluded from the analysis. Surface settlement induced by tunneling can vary due to the geological conditions.Therefore, soil type (ST), a categorical feature, is introduced to represent the geological conditions at shield face during excavation.In this study,the multiple soil layers encountered by the shield face during Projects 1 and 2 excavations are grouped into two types for simplicity. Consequently, the soil type feature contains two categories,i.e.“silty clay”and“silty sand”,to describe the soil types at the face of the EPB shield passing through Projects 1 and 2,respectively. It should be noted that the RF model cannot handle categorical features; hence, the “silty clay” and “silty sand” were encoded as 0 and 1,respectively.Apart from that,the depth of the groundwater(GW)level,which is measured from the surface level to the tunnel crown level,is also considered as an input parameter to represent the geological conditions.

An EPB shield is an advanced machine with many operational parameters to monitor and control during excavation.Face pressure is considered a key operational parameter to control surface settlement (Suwansawat and Einstein, 2006; Qin et al., 2021). This study includes 11 input parameters under the category of shield operational parameters. Three earth-pressure gauges (at the top,left and right) were installed to monitor the face pressure of the“Ishikawa” EPB shield used in Project 2 (Kannangara et al., 2022).Two input features, i.e. “face pressure (top)” and “face pressure(center),”were formulated to capture the influence of face pressure on settlement due to EPB tunneling. Grouting is another key parameter for controlling tunneling-induced settlement and has been complimented in earlier studies (e.g. Suwansawat and Einstein, 2006; Pourtaghi and Lotfollahi-Yaghin, 2012; Ocak and Seker, 2013; Wang et al., 2013; Bouayad and Emeriault, 2017).However, the effect of grouting is unable to consider in this study due to the unavailability of related data for Project 1.

The attitude and position of the shield machine can indicate whether the excavation follows a design tunnel axis (DTA). Six parameters, i.e. horizontal deviation (front), vertical deviation(front), horizontal deviation (back), vertical deviation (back),pitching angle and rolling angle, are used to describe the attitude and position of the shield machine (Zhou et al., 2019). The “front”refers to the center point of the shield head; “back” refers to the center point of the shield tail,as shown in Fig.6.If the shield’s front or rear point is on the right or upper side of the DTA, then the corresponding values are positive and vice versa(i.e.the values are negative,if the shield’s front or rear point is on the left or the lower side of the DTA). The pitching and rolling angles represent the attitude of the shield machine relative to the transverse and longitudinal axes, respectively. The pitching angle values are positive when the shield is pitching upward, as shown in Fig. 6b. The data relating to the rolling angle of the EPB shield were unavailable for Project 2; hence, the rolling angle was excluded from the data set.Moreover,data related to four more shield operational parameters(i.e. advance rate, thrust, torque and jack pressure) are also included in the data set.The corresponding symbol and the unit for each parameter considered in the data set are presented in Table 2.

Altogether, 264 observations are available for each feature considered in this study.Usually,the observations in a data set are randomly divided into training and testing sets before the model building phase.In this study,80%of the data samples are randomly chosen to establish the training set, and the remaining data samples are allocated to the testing set.Note that the testing data must only be used to evaluate the model’s performance. Table 3 summarizes the statistical information related to the training set. The training set includes 211 observations per feature. Table 3 also includes the mean, standard deviation (Std.), minimum (Min.),maximum (Max.) and three percentiles (25%, 50% and 75%) associated with the data samples in the training set.

Fig. 6. Illustration of parameters related to attitude and position of a shield machine:(a) Horizontal and vertical deviation and (b) Pitching and rolling.

4. Results and discussion

In this study, three separate analyses are conducted to investigate the effect of different feature selection methods on tunnelinginduced settlement predictions.In Analysis 1,feature selection was performed by assessing the Pearson correlation coefficients. Then,an RF model was built and trained on a data set consisting only of features selected from Analysis 1.Next,the predictions were made using the testing set,and the evaluation metrics were computed.In Analysis 2,two sequential feature selection algorithms(i.e.SFS and SBS) were utilized. Finally, in the third analysis, the Boruta algorithm was employed to obtain all-relevant features.For Analyses 2 and 3, separate RF models were built, trained and evaluated, as in the case of Analysis 1.Later,the model with better performance was chosen as an input for the SHAP algorithm to understand further how the input features contribute to Smax.

4.1. Analysis 1: Pearson correlation method

Fig.7 shows the Pearson correlation coefficients computed using the corr(∙) function available in the Pandas library. Note that the guideline recommended by Zhang et al. (2021b), which was introduced in Section 2.2.1,is adopted to facilitate the feature selection in this analysis. Consequently, only the feature ST has a strong correlation(0.6<|r|<0.79)with Smaxwhile five features(i.e.FPt,Th,To,JP and VDf) are moderately correlated (0.4 < |r| < 0.59) with Smax.Furthermore,parameters H,AR,HDb and VDb show weak correlation,while GW,FPc,PA and HDf illustrate a very weak correlation with the target variable Smax. It is worth noting that this study adopted two representative features(i.e.FPt and FPc)for describing face pressure.However,the feature FPt has a relatively strong correlation with Smaxcompared to that of FPc,as shown in Fig.7.

As stated earlier, feature selection using the Pearson correlation coefficients can be challenging, as many features may not strongly correlatewith the targetoutput,asevidentinthisstudy.Nevertheless,all the features with|r|≥0.4 are considered useful features for predicting Smax. Accordingly, six features (i.e. ST, FPt, Th, To, JP and VDf)with moderate to strong correlations were selected to conduct Analysis 1.In contrast,eight features(i.e.H,GW,FPc,AR,PA,HDf,HDb and VDb) with |r| ≤0.39 were excluded from the data set when training the RF model in Analysis 1.It is worth noting that several key featuressuchasthe coverdepth(H)and groundwaterlevel(GW)were dropped due to this selection process. After that, new training and testing sets were formed by considering the selected features from Analysis 1. Then, an RF model was trained on the newly developed training set with reduced features.Hereafter,this trained RF model in Analysis 1 is identified as RF_pearson.Fig.8 shows predictions made from the RF_pearson model for training and testing data using filled square markers and round markers,respectively.The mean squared error(MSE)fortrainingdatais13.33,whilethatof testingdatais32.51.Furthermore, the coefficients of determination (R2) for training and testing data are 0.92 and 0.72, respectively. Although the training accuracy is high,one can see that the testing accuracy for Analysis 1 is unsatisfactory.

4.2. Analysis 2: sequential feature selection

In Analysis 2,the feature selections using SFS and SBS algorithms are investigated. Table 4 presents the results obtained after implementing the respective algorithms. Note that the hyperparameter d(i.e.number of desired features)introduced in Section 2.2.2 was setequal to 7 during the analysis for both algorithms. Consequently,each algorithm has selected the top-seven features(marked as True in Table 4) suitable for predicting Smax. From Table 4, it is observed that five out of seven chosen parameters(i.e.H,ST,GW,VDf and VDb)by both algorithms were the same. Surprisingly, features related to face pressure (i.e. FPt and FPc) were unable to reach the top-seven features in both cases.

Table 3 Summary of statistical description for the training data set.

Fig. 7. Heat map of Pearson correlation coefficients.

Fig. 8. Prediction results for training and testing data used in Analysis 1.

Then, two RF models were built and trained on data sets containing the selected features from SFS and SBS algorithms; subsequently, these models will be identified as RF_sfs and RF_sbs. The prediction results for RF_sfs and RF_sbs on training and testing data sets including the corresponding MSE and the R2values are shown in Fig.9.Compared with Analysis 1,models considered in Analysis 2 show better performance when predicting Smaxin training and testing phases.However,this method may not capture all relevant features while the results will vary according to the value assigned to hyperparameter d.

4.3. Analysis 3: Boruta algorithm

Because the Boruta algorithm is a wrapper method, it is dependent on a learning algorithm (cf. Fig. 2). Therefore, an RFmodel was initially built;then,it was used as an input to the Boruta algorithm. It should be noted that this analysis was done by implementing the Borutapy (2021) algorithm using Python language.The maximum number of iterations to perform(max_iter)is a hyperparameter in the Boruta algorithm,and its value was set at the default value of 100.

Table 4 Feature selection results obtained using the sequential feature selection algorithms.

Fig.9. Prediction results for training and testing data used in Analysis 2:(a)RF_sfs and(b) RF_sbs.

Table 5 presents the results obtained for Analysis 3 using the Boruta algorithm. The Boruta algorithm assigns Rank 1 to all features confirmed to keep as relevant features for predicting the target variable(Smax).Accordingly,nine features(i.e.H,ST,GW,FPt,PA, To, JP, VDf and VDb) can be selected. A feature with any other rank is rejected to keep by the Boruta algorithm. Therefore, five features (FPc, AR, Th, HDf and HDb) were considered unimportant for predicting Smax.According to Analysis 3,all features considered under tunnel geometry(i.e.H)and geological conditions(i.e.ST and GW) were selected as relevant features. Also, note that the FPt has been elected as a relevant feature for predicting Smaxover FPc.Moreover,two features related to the vertical position of the shield(i.e. VDf and VDb) were chosen, while the features describing the horizontal position of the shield (i.e. HDf and HDb) were rejected.Additionally, the feature ranking property of the Boruta algorithm makes the feature selection task convenient as opposed to using the Pearson correlation method or sequential selection algorithms.

As with Analyses 1 and 2, new training and testing data sets were developed by eliminating the rejected features from Analysis 3. Then, a new RF model was built and trained on the training set with reduced features, and the predictions were made using thetesting set. Henceforth, the RF model trained in Analysis 3 is recognized as RF_boruta. Fig.10 reveals the results obtained using the RF_boruta model.The MSE for training and testing data are 6.26 and 19.49, respectively. The R2value for the training data is 0.96,whereas that generated for testing data is 0.83. Compared with Analyses 1 and 2, both the training and testing accuracies have improved in Analysis 3.In other words,the RF_boruta model,which was trained on the data set with the features selected using the Boruta algorithm (Analysis 3), can better approximate the tunneling-induced maximum settlements observed in this study.

Table 6 summarizes the results obtained during feature selection analysis, including the case where no feature selection was performed.By comparing the results in Table 6,the highest R2value of 0.83 for the testing case was achieved using the features selected from the Boruta algorithm. Therefore, the RF_boruta model is chosen as an input to the SHAP algorithm and discussed in the following section.

4.4. SHAP analysis

Fig.10. Prediction results for training and testing data used in Analysis 3.

Table 6 Summary of the feature selection analysis.

As discussed earlier, the SHAP (Lundberg and Lee, 2017) is a recently introduced algorithm to explain the ML model predictions using Shapley values. It requires a trained ML model as an input.Therefore, the RF_boruta (i.e. trained ML model obtained from Analysis 3) is chosen as the input model because it was the best performing model during the feature selection phase.Although the SHAP is a local explanation method, an aggregated global view of the Shapley values can also be obtained, as shown in Fig.11. The horizontal axis in Fig.11 represents the SHAP values, and the vertical axis represents the features of interest for the model’s prediction.The color bar denotes the actual feature value.For instance,low H values(bluish points)represent shallow heights.In contrast,larger H values are presented using reddish points. Each point in Fig.11 represents an observation in training set with selected features from Analysis 3.

The features in the y-axis are ordered from top to bottom,where the feature with the highest mean absolute SHAP value is at the top.Therefore, the soil type (ST) is the most important feature among the nine features shown in Fig.11 for predicting Smax.The positive and negative SHAP values increase and decrease the model’s output,respectively.The two categories,i.e.silty clay(blue points)and silty sand (red points) of feature ST, are shown in Fig. 11.Accordingly, silty clay, which has higher positive SHAP values,contributes more to the model’s output Smaxthan silty sand.Similarly, one can see that the lower FPt values produce positive SHAP values, while negative SHAP values are observed when the FPt values are large. This makes sense, as the lower face pressure can lead to larger surface settlements. Interestingly,when the H is large,the corresponding SHAP values are positive(i.e.increase the model’s output).The SHAP values for To,VDb,PA,GW,JP and VDf are mainly concentrated near zero. It should be noted that the zero SHAP value means there is no impact on the model’s prediction.

Using the SHAP package,dependence plots can be generated to study how the model prediction changes with different feature values.For instance,Fig.12 presents two example dependence plots corresponding to FPt and H. Variations of the SHAP values for FPt with the FPt values are shown in Fig.12a.The blue and red points,as appeared in Fig. 12a, symbolize the data related to silty clay(encoded as 0) and silty sand (encoded as 1), respectively.Accordingly, one can see that the larger positive SHAP values are estimated when the EPB operated at low FPt values while excavating through silty clay (i.e. Project 1). In contrast, when the EPB shield was operated at higher FPt values during the excavation through silty sand (i.e. Project 2), the corresponding SHAP values are observed to be negative.Similarly,Fig.12b plots the values of H on the x-axis and the SHAP values of H on the y-axis.When the FPt values are larger (say FPt > 100 kPa), the corresponding SHAP values for H are mostly close to zero or negative. In other words,this implies that cover depth(H)has a no or negative impact on the model’s output Smaxwhen there is sufficient face pressure. However,if the FPt values are lower at higher depths(>15 m),the Smaxbecomes larger, indicating the larger positive SHAP values for H.

5. Limitations

Lack of necessary data is a major challenge when working on ML-based projects.In this study,the parameters related to grouting quality(e.g.grouting pressure,grouting volume and percent grout filling)were not considered,as they were unavailable for Project 1.Also, the inclusion of tunnel diameter proved not helpful for this investigation, as the tunnel diameters were the same for both projects.It is worth noting that applying such effective parameters can improve the ML model’s performance.

Fig.11. Summary plot obtained from SHAP analysis.

Fig.12. Dependency plots obtained from SHAP analysis for (a) FPt and (b) H.

6. Conclusions

This study analyzed the tunneling-induced settlement observed during subway excavations in Hangzhou, China using EPB shield machines. In the first part, four feature selection methods (i.e.Pearson correlation,SFS,SBS and Boruta algorithm)were applied to select features from a data set consisting of 14 input features(i.e.H,ST, GW, FPt, FPc, AR, PA, Th, To, JP, HDf, VDf, HDb and VDb). Subsequently, four RF models (i.e. RF_pearson, RF_sfs, RF_sbs and RF_boruta)were built and trained on the selected features from the corresponding feature selection method. Then, the four models’performances were compared by computing the MSE and R2. According to the analysis,the RF_boruta outperformed the other three models. Finally, the RF_boruta model was used as an input to the SHAP algorithm and the contribution of each feature on the maximum surface settlement due to EPB shield tunneling was further investigated.Based on this study,several conclusions can be drawn as follows:

(1) Feature selection is an essential step in the ML process and can be achieved by various methods.The Pearson correlation method is simple to implement and independent from the ML algorithm. However, it may be challenging to decide which feature to select when there is a weak correlation with the desired output. On the other hand, the feature selection algorithms based on wrapper method (e.g. SFS, SBS and Boruta algorithm)can easily identify the important features.However, some programming knowledge is required to implement the wrapper-based algorithms. According to this study, the features selected from the Boruta algorithm provide the best performance capabilities of predicting Smax.The tunneling-induced maximum surface settlement is greatly influenced by H,ST,GW,FPt,PA,To,JP,VDf and VDb.This study demonstrates that feature selection is an important step and should not be ignored when predicting Smaxdue to shield tunneling. It is recommended to apply at least two feature selection methods and compare,especially when there is less knowledge about the relationship between input and output parameters.

(2) The SHAP algorithm is a powerful tool that can understand the output of a complex ML model. Further, it enables to verify a model’s accuracy by allowing its user to examine how various features contribute to the model’s prediction.According to the SHAP analysis conducted in this study,tunneling-induced settlement predominantly depends on soil type. Further, the low FPt values increase (contribute positively to) the model’s output (i.e. Smax). Also, it is worth emphasizing that,although FPt and FPc features describe the face pressure, their influence on the tunneling-induced maximum settlement may vary.

(3) ML is a powerful technique to analyze complex problems in geotechnical engineering.However,the lack of interpretability makes it more challenging for them to be used in the routine design. Therefore, the application of XAI methods for geotechnical problemswill become more popular inthe future.

Data availability

The data set used is available on GitHub at https://github.com/umgeotech/Database/tree/master/Surface%20Settlement.For any inquiries, please contact the corresponding author of the current manuscript.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

The authors gratefully acknowledge the financial support provided by The Science and Technology Development Fund, Macau SAR, China (File Nos. 0057/2020/AGJ and SKL-IOTSC-2021-2023)and Science and Technology Program of Guangdong Province,China (Grant No.2021A0505080009).