Auto machine learning-based modelling and prediction of excavationinduced tunnel displacement

2022-08-24 10:02DongmeiZhangYimingShenZhongkaiHuangXiaochuangXie

Dongmei Zhang, Yiming Shen, Zhongkai Huang, Xiaochuang Xie

a Key Laboratory of Geotechnical and Underground Engineering, Ministry of Education, Tongji University, Shanghai, 200092, China

b Department of Geotechnical Engineering, College of Civil Engineering, Tongji University, Shanghai, 200092, China

Keywords:Soil-structure interaction Auto machine learning (AutoML)Displacement prediction Robust model Geotechnical engineering

ABSTRACT The influence of a deep excavation on existing shield tunnels nearby is a vital issue in tunnelling engineering. Whereas, there lacks robust methods to predict excavation-induced tunnel displacements. In this study, an auto machine learning (AutoML)-based approach is proposed to precisely solve the issue.Seven input parameters are considered in the database covering two physical aspects, namely soil property, and spatial characteristics of the deep excavation. The 10-fold cross-validation method is employed to overcome the scarcity of data,and promote model’s robustness.Six genetic algorithm(GA)-ML models are established as well for comparison. The results indicated that the proposed AutoML model is a comprehensive model that integrates efficiency and robustness. Importance analysis reveals that the ratio of the average shear strength to the vertical effective stress Eur/σ′v,the excavation depth H,and the excavation width B are the most influential variables for the displacements. Finally, the AutoML model is further validated by practical engineering. The prediction results are in a good agreement with monitoring data, signifying that our model can be applied in real projects.

1. Introduction

As a low-carbon and convenient public transportation option,metros have been widely distributed underground in China.Meanwhile, deep excavation is commonly employed in the construction of underground structures. The unloading of the deep excavation will apply extra force to existing shield tunnels nearby,and ultimately cause deformation in the structure. Once the displacement of the existing tunnels exceeds a threshold,the safety and serviceability of metros will be threatened,potentially causing property damage, and in extreme cases, casualties. As a consequence, the influence of deep excavation on nearby shield tunnels is a hot issue internationally.

Up to now, researchers worldwide have studied the deformations of shield tunnels caused by deep excavation from various perspectives,including field observations(Devriendt et al.,2010; Wang et al., 2013; Li et al., 2017, 2018a; Zheng et al., 2020;Liang et al., 2021), analytical and semi-analytical solutions (Zhang et al., 2013, 2020a; Liang et al., 2017, 2018; Shi et al., 2017; Zheng et al., 2018; Cheng et al., 2020a), experimental investigations (Ng et al., 2013, 2015; Huang et al., 2014; Zheng et al., 2010; Meng et al., 2021a, b), and numerical simulations (Doleˇzalová, 2001;Huang et al., 2013; Chen et al., 2016; Liao et al., 2016; Xing et al.,2016; Shi et al., 2017, 2019; Li et al., 2018b). In terms of field monitoring, from a case in Tianjin, China, Zheng et al. (2020)investigated the responses of existing tunnels to an adjacent excavation at an oblique intersection angle. Liang et al. (2021)comprehensively monitored and analysed the corresponding responses of metro station and shield-driven tunnels to zoned excavation of a large-scale basement in soft ground. Regarding analytical approaches, Shi et al. (2017) carried out a systematic numerical parametric study. Simplified and approximate calculation charts and best-fit curves were developed to estimate the adjacent tunnel response owing to an overlying basement excavation. Based on the Timoshenko beam theory, Cheng et al. (2020a)proposed an analytical framework to consider both the longitudinal and circumferential behaviours of tunnels. The excavationinduced tunnel damage potential can be quickly assessed through a quantified serviceability limit state. From the aspect of experiment, centrifuges are commonly adopted to simulate the deep excavation. Huang et al. (2014) conducted four centrifuge model tests to investigate the effects of deep excavation that is directly above an existing tunnel. Meng et al. (2021a) carried out a threedimensional centrifuge test to analyse the long-term responses of a tunnel buried in kaolin clay,on account of a nearby excavation.As for numerical modelling,Li et al.(2018a)used ABAQUS to simulate the response of an adjacent tunnel under different excavation methods of foundation pit. Shi et al. (2019) aimed to explore the most preferable excavation geometry of a basement, which can have a minimum impact on an existing tunnel.

In spite of the numerous fruitful achievements exhibited above,there are more or less some shortcomings when applying these traditional methods in engineering practice. First of all, the application of field observation methods is strictly restricted by complex procedures while experimental investigations are expensive,timeconsuming and hard to recur.Regarding numerical simulations,the analog results are highly dependent on the quality of the input, as well as the human experience.As for analytical and semi-analytical solutions, the derived formulas are heavily relied on certain assumptions,which are unable to be satisfied in reality.In response to these phenomena,it is urgent to find a new approach to predict the excavation-induced tunnel displacements.

As an emerging method,machine learning(ML)has been widely concerned in geotechnical engineering, due to the characteristic of high efficiency, first-class generalisation performance, and highdimensional problem solving ability. Various kinds of ML algorithms have been utilised in specific fields, such as slope and landslides (Liu et al., 2014, 2020; Xu and Niu, 2018), pile settlement(Azizkandi et al., 2014; Armaghani et al., 2018, 2020), characterisation of soil properties (Kurnaz and Kaya, 2018; Ching and Phoon,2019; Cheng et al., 2020b; Zhang et al., 2020b, 2021a), retaining wall deflection brought about by deep braced excavation(Kung et al.,2007; Ji et al.,2014; Goudjil and Arabet,2021; Zhang et al.,2021b),ground surface settlement induced by tunnelling (Neaupane and Adhikari, 2006; Pourtaghi and Lotfollahi-Yaghin, 2012; Kohestani et al., 2017; Moeinossadat et al., 2018; Zhang et al., 2019a), and subsurface stratification from limited boreholes (Shi and Wang,2021a, b). To be specific, Zhang et al. (2020b) developed a nonparametric ensemble artificial intelligence approach to calculate the compression modulus Esof soft clay based on a gradient-boosted regression tree algorithm. The validation results showed that the proposed model had great potential for improving predictions.Moeinossadat et al. (2018) compared the ability of different ML algorithms to predict maximum surface settlement (Smax) caused by tunnelling. Furthermore, some scholars have summarised the progress in the field of geotechnical engineering related to ML(Moayedi et al.,2020;Yin et al.,2020;Zhang et al.,2020c;Jong et al.,2021). Nevertheless, the researches in regard to the excavationinduced tunnel displacement are rare so far.

In view of the above,this study proposes an auto ML(AutoML)-based approach to investigate the influences of deep excavations on nearby existing shield tunnels, and to precisely predict the excavation-induced tunnel displacements. On the whole, the structure of our study is organised as follows.Firstly,we introduce the concept of AutoML and auxiliary methods for modelling. Secondly,the database and data pre-processing methods we utilise are explained.Subsequently,we determine the optimal parameters for the AutoML model, and the powerful fitting and extrapolation abilities of our model are demonstrated via comparison with other classical GA-ML models.The relationship between the reliability of a model and its accuracy rate is then summarised, and an importance analysis is carried out to rank the influence extent of deep excavation parameters on the tunnel displacement.In the end,our model is validated by real projects,and remarkable conclusions are ultimately summarised.

2. Methodology

2.1. AutoML

Despite its ability to solve high-dimensional nonlinear problems, an individual ML algorithm has two inherent drawbacks.Firstly, each algorithm has its applicable conditions, and it is unrealistic for a single ML estimator to perform outstandingly on each unique dataset.In consequence,various ML algorithms are typically adopted by researchers to simultaneously build fitting models for a specific dataset.The one with best performance is then regarded as the ultimate model (Ocak and Seker, 2013; Chen et al., 2019; Gao et al., 2019). This phenomenon makes the modelling process knowledge-based and time-consuming.Secondly,the performance of some ML estimators is highly dependent on hyper-parameter optimisation. Once being trapped by local optimisations, the model will behave execrably.

To overcome these two shortcomings mentioned above, the concept of AutoML has emerged in response in recent years.AutoML regards these drawbacks as a combined algorithm selection and hyper-parameter optimisation(CASH)problem(Thornton et al., 2013), of which the terminology is defined as follows.

Fig.1. Generic framework of AutoML.

The CASH problem was first tackled by Thornton et al. (2013),and then was improved by Feurer et al.(2014).A new system called AUTO-SKLEARN was proposed to increase the efficiency and robustness of AutoML. Fig. 1 exhibits the generic framework of AutoML. The classical ML framework is represented by the grey areas, where 5 data pre-processing methods, 18 feature preprocessing methods, and 13 regressors have been packaged in the system. By imitating the way in which people accumulate experience from previous work,the meta-learning submodule in AutoML can automatically locate good instantiations of ML frameworks by calculating the similarities between the new data and existing datasets,thereby boosting the efficiency of modelling(Feurer et al.,2014).A state-of-the-art Bayesian optimiser is packed in the system to automatically tune the hyper-parameters,including the weights of 13 algorithms and their corresponding hyper-parameters.Finally, combining with the results of the Bayesian optimisation,an ensemble estimator is established for prediction.For illustration,an ensemble estimator can be consisted of 40%random forest(RF),20% stochastic gradient descent (SGD), and 40% extreme gradient boosting (XGBoost). Moreover, the time required to build the AutoML model can be set artificially for ease of control.

2.2. Classical ML algorithms

2.2.1. Multi-layer perceptron

A multi-layer perceptron (MLP) (a class of ANNs) is a type of neural network consisting of an input layer, one or more hidden layers, and an output layer, of which the schematic diagram is shown in Fig. 2a. By training on a dataset, the MLP can learn a nonlinear function approximator f(∙):Rm→Ro,where m and o are the numbers of dimensions of the input and output, respectively.For the sake of understanding,the process of a neural network with one hidden layer is illustrated in detail here. The output of the jth neuron in the hidden layer hjis defined as follows:

where xiis the value of the ith input neuron, ωj,iis the weight coefficient between the ith input and jth hidden neuron,θjis the bias in the output layer,and f(∙)is the activation function for the hidden layer.

The predicted result ykis expressed as follows:

where n is the number of neurons in the hidden layer, ωk,jis the weight coefficient between the jth hidden layer and kth output neuron,θkis the bias in the output layer,and g(∙)is the activation function for the output layer. Generally speaking, four types of activation functions are alternative, which are ‘Identity’: f (x) = x,‘Tanh’:f(x)=tanh(x),‘Logistic’:f(x)=1/[1+exp(-x)],and‘Relu’:f(x)=max(0,x).We used‘Relu’in the study.By tuning the number of hidden layers and neurons, as well as the values of the weight coefficient and bias, an optimum structure of the MLP can be determined and used for prediction.

2.2.2. SVM

The SVM is a powerful algorithm for solving classification and regression problems (Cortes and Vapnik, 1995). Based on the concept of structural risk minimisation, an SVM involves simultaneous attempts to minimise the empirical risk and Vapnik-Chervonenkis dimension(Gunn,1998;Basak et al., 2007), as well as to find a function f(x) that satisfies the following relationship:

Fig. 2. Schematic diagrams of three classical ML algorithms: (a) MLP; (b) Support vector machine (SVM) regression; and (c) RF.

To better manage nonlinear regression problems and reduce computation, the inner product operation is often replaced by a kernel function (Mahdevari et al., 2014), so as to create a highdimensional feature space where data can be fitted by a linear equation,as shown in Fig.2b.In this paper,the radial basis function is employed, which is defined as follows:

where γ is the kernel parameter.

2.2.3. RF

RF is an ensemble ML algorithm comprising bootstrap aggregation (‘bagging’) (Breiman, 1996) and a decision tree (Ho,1998).During the sampling stage,n alternative sets are created by random sampling.Each decision tree is then built on a single alternative set to learn the function approximator fi(x).Finally,the outputs of each decision tree are integrated,and the result of the RF prediction can be expressed as follows:

where y is the prediction output of the RF;X=[x1,x2,…,xm]T;and n is the volume of the single alternative set. A schematic diagram of the RF is shown in Fig. 2c.

2.2.4. Gradient boosting decision tree

The gradient boosting decision tree (GBDT) is an updated version of the boosting decision tree. Instead of simply adjusting the weight of a weak learner,the GBDT aims to reduce the residual after each calculation.This is achieved by building a new model in the gradient descent orientation of the residual (Friedman, 2001).

Based on the training set T = {(x1, y1), (x2, y2), …, (xN, yN)} and loss function L(y, f(x)), a GBDT ^f(x) can be built through the following steps:

(1) Step 1: Initialise the model with the training set and loss function:

(2) Step 2: For m = 1, 2, …, M, generate M regression trees iteratively:

(i) For all the training samples, for i = 1, 2,…,N, calculate the negative gradient of the loss function and regard it as the estimation of the residual rmias follows:

In our research, the loss function is chosen as the least squares regression.

2.2.5. XGBoost

Initially proposed by Chen and Guestrin(2016),a great amount of work has been done on XGBoost, aiming to overcome the shortcomings of the GBDT.XGBoost speeds up the computations by an order of magnitude, and improves the accuracy (precision) in classification (regression)tasks.

XGBoost adds a regularisation term to the loss function to control the complexity of the trees and prevent overfitting. The regularisation term is expressed as follows:

where Ijrepresents the samples in leaf node j.

In addition, XGBoost employs the exact greedy algorithm for finding splits.All of the improvements mentioned above have made the XGBoost algorithm a state-of-the-art tool in every field(Sheridan et al.,2016; Fan et al.,2018; Zhang et al., 2020d).

2.2.6. Light gradient boosting machine

The light gradient boosting machine (LightGBM) is another evolutionary version of the GBDT algorithm,and was developed by Microsoft(Ke et al.,2017).Through different optimisation methods,LightGBM achieves faster training efficiency and lower memory usage relative to XGBoost.

Aiming at the generative process of the decision tree, XGBoost adopts an algorithm called ‘level-wise tree growth’. Leaf nodes in the same level are considered equivalent, and grow or cease simultaneously, as shown in Fig. 3a. In contrast, LightGBM adopts‘leaf-wise tree growth’algorithm.As shown in Fig.3b,it will choose the leaf with the maximum delta loss for growing, thereby guaranteeing the efficiency of the algorithm.

Furthermore, LightGBM uses a histogram algorithm to find splitting points, and can directly manage category characteristics.All of these advantages make LightGBM a popular algorithm for processing large-scale data (Dev and Eden, 2019; Zhang et al.,2019b).

2.3. Hyper-parameter tuning

Appropriate hyper-parameters of classical ML algorithms are of vital importance to establish high precision prediction models for a specific problem (Zhang et al., 2020e). To tune hyper-parameters,optimisation algorithms such as particle swarm optimisation, genetic algorithms (GA), artificial bee colony, and firefly algorithms have been respectively employed in previous research (Karaboga et al., 2014; Hasanipanah et al., 2016; Bui et al., 2018; Zhang et al.,2020b). In this paper, GA is adopted, owing to its ability to optimise the integer hyper-parameters.

2.3.1. GA

GA is a calculation algorithm to search an optimal solution by simulating natural selection and biological evolution (Holland,1992). Fig. 4 summarises the universal flow chart of GA, which contains five steps:

(1) Step 1:Predefine an objective function according to a specific problem,as well as the population size,the value of tolerance(tol), the mutation probability and the maximum iteration(Gen)of GA.In prediction problems,the objective function is commonly defined as the mean square error(MSE)between the predicted and true values.

(2) Step 2: Generate an initial population randomly, and calculate its fitness T according to the following formula:

where n is the value of population size,and MSEirepresents the performance of the ith individual. The individual with minimum MSE is regarded as the best one of the population.If the fitness T meets the tolerance, go directly to Step 5.Otherwise,execute Step 3.

Fig. 3. Two types of growth strategies for decision trees: (a) Level-wise tree growth (adopted in XGBoost); and (b) Leaf-wise tree growth (adopted in LightGBM).

Fig. 4. Universal flow chart of GA.

(3) Step 3: Create the next generation population through selection, crossover, and mutation, and calculate the fitness T again through Eq. (14).

(4) Step 4:Repeat Step 3 until the fitness T meets the tolerance,or the predefined maximum iteration is reached.

(5) Step 5: Set the best individual of the last generation population as the optimal result, and save it for further use.

2.3.2. Hybrid algorithm GA-ML

With the assistance of GA,we create a hybrid algorithm named‘GA-ML’ to predict the excavation-induced tunnel displacements for comparison with the AutoML model. Table 1 summarises the critical hyper-parameters of each GA-ML algorithm that should be adjusted.The definition,data type,and tuning range of each hyperparameter are introduced in detail. The selection of critical hyperparameters and their tuning ranges are determined in accordance with modelling experience, suggestions from the model creator,and previous studies (Mahdevari et al., 2014; Zhou et al., 2020;Zhang et al., 2021a).

Before applying the algorithm,some parameters of GA should be predefined as well. Table 2 tabulates the terminology of these parameters and their initial values used in this paper. The term of population size represents the total number of individuals among the population in each generation, which is set to 100 in the algorithm. The maximum iteration is on behalf of the compulsory termination condition, and is set to 200 in this paper. Once the restriction value has been reached, the tuning process will cease automatically. Mutation is adopted to produce individuals with better performance.Its probability is assigned as 0.005 in each GAML algorithm.

2.4. Auxiliary methods

2.4.1. k-fold cross-validation

To promote the generalisation ability of the ML model and overcome the scarcity of data,the k-fold cross-validation method is often adopted during the model training progress (Stone, 1974;Zhang et al., 2020e, 2021a). The original training dataset is randomly divided into k subsets, among which k-1 subsets are used as the training sets,and the remaining one is employed as the validation set for testing the performance of the sub-model. This process will repeat k times to ensure that each subset has served as the validation set one time.In the end,the average performance of k sub-models is considered as the performance of the entire model,and is expressed as follows:

Based on the recommendations of previous studies,k is set to 10 in our research(Kohavi,1995; Rodriguez et al., 2010).

Table 1 Specific information of critical hyper-parameters in each GA-ML algorithm.

Table 2 Initial GA parameters in the GA-ML model.

2.4.2. Evaluation criterion

To quantificationally evaluate the performance of the models created by GA-ML algorithms and AutoML, two evaluation indices are accepted in this study, which are MSE and goodness of fit(R2).The computational formulas for the two indices are expressed as follows:

3. Database and pre-processing

3.1. Database introduction

Despite that there are numerous engineering cases in which deep excavations are constructed nearby existing shield tunnels in reality, the on-site data collection is completely arduous. The existing data are frequently in lack of key parameters and the volume is insufficient to develop an estimator as well.To overcome this challenge, a database related to the influences of deep excavation on existing tunnels was established by Zhang et al.(2020a) through numerical modelling. The numerical modelling with typical geometry is exhibited in Fig. 5. The rationality of the whole work has been verified previously, and hence the reliable database will be employed in this paper to build ML estimators.

To fully grasp the distribution of data,the mathematical statistics on variables in the database are conducted and tabulated in Table 3.The maximum(Max.)and minimum(Min.)are on behalf of the value range of the variable, Ave. denotes the average value, while the standard deviation (S.D.) represents its degree of dispersion. In the database, seven factors concerning deep excavation were considered,and therefore been regarded as the input variables in this study.They are the excavation depth H,excavation width B,wall thickness t,ratio of the average shear strength to the vertical effective stress su/σ′v,ratio of the average unloading/reloading Young’s modulus to the vertical effective stress Eur/σ′v, horizontal distance between the tunnel and retaining wall Dh, and ratio of the buried depth of the tunnel crown to the excavation depth Dv/H. The multi-collinearity diagnosis has been conducted to verify the orthogonality of the seven factors through the variance inflation factor (VIF) (O’Brien,2007). The reasons for choosing these influential factors can be found in Zhang et al. (2020a). Meanwhile, the maximal horizontal tunnel displacement δhmis included in the database as the output,and therefore been considered as the target in the ML models.

3.2. Database processing

Fig. 5. Numerical modelling with typical geometry.

Table 3 Statistic of input and output variables in the database.

Before utilising the original database to create ML models,several data processing methods should be adopted to enhance the quality of the data. Firstly, among the 183 sets of data in the database, 70% (128 sets of data) are randomly sampled as the training set,whereas the remaining 30%(55 sets of data)are adopted as the testing set. The selected data division ratio is advised by Looney(1996) and Nelson and Illingworth (1991), and was widely adopted in previous studies(Kurnaz and Kaya,2018;Moeinossadat et al.,2018; Goudjil and Arabet, 2021). Secondly, the value of every feature in the training set is normalised to range (0, 1) so as to eliminate the adverse effects of singular samples, and to speed up the training process. The post-normalisation value Xscaledis calculated as follows:

where Xminand Xmaxare the minimum and maximum value of the same variable in the training set,respectively;and X is the original value in the database. Finally, the 10-fold cross-validation method mentioned in Section 2.4.1 is utilised to create the validation set for the model training process, and to enhance the robustness of the model as well.

For the sake of understanding,the whole technological process of the modelling by GA-ML is plotted in Fig.6.Apart from utilising the pre-processing methods mentioned above,the GA is adopted to seek the optimal hyper-parameters. Then the GA-ML model is established on the basis of the training set, and is subsequently evaluated by the testing set.Meanwhile,the AutoML model is built in accordance with the framework shown in Fig. 1. The hyperparameters are tuned by the AutoML system automatically according to the characteristic of the dataset. All of the results exhibited below are implemented in a desktop(Central Processing Unit (CPU): Intel Core i7-8700K @3.70 GHz, RAM: 32 GB) with a Windows system, Windows Subsystem for Linux, and Python 3.6.

4. Results and discussion

4.1. Optimal hyper-parameters for each GA-ML model

By utilising the GA to search the optimal hyper-parameters for each GA-ML model, the evolutionary process is traced and visualised in Fig. 7. Firstly, all of the models are converged before the compulsory termination condition. To be specific, the GA-RF and GA-SVM converge in less than 20 iterations, which are the two most efficient models of all.However,the convergence patterns of two models are distinctly different.Compared with the initial value,the fitness of GA-SVM decreases sharply from 811.95 to 140.45.This phenomenon indicates that the performance of GA-SVM relies heavily on the tuning of hyper-parameters, which verifies the second drawback of an individual GA-ML algorithm mentioned in Section 2.3. On the contrary, the convergent fitness of GA-RF only decreases by 0.09%,from 277.09 to 276.85,demonstrating that the influence of hyper-parameters on GA-RF can be negligible. Subsequently, after approximately 30 iterations, the GA-MLP and GAGBDT model converge, followed by GA-XGBoost ranking fifth. The convergence efficiency of GA-LightGBM is the worst among all models, which need approximately 150 iterations to converge.Except the different efficiencies, the convergence patterns of the last four GA-ML models are approximately identical. As the iterations progress, the fitness of the model declines gradually with fluctuations, and ultimately remains steady.

Fig. 6. Modelling process of GA-ML.

Fig. 7. Evolutionary process of each GA-ML model.

Once the fitness of each GA-ML model remains stable, the corresponding hyper-parameter values are considered optimal and directly passed to the ultimate estimators. The optimal value of each hyper-parameter is gathered in Table 4,and the time required to establish each ultimate estimator is recorded as well for further analysis. From Table 4,it can be seen that the running time of GASVM is the shortest, since SVM was originally designed for small dataset. The speeds of GA-XGBoost and GA-LightGBM are approximately identical, followed by GA-GBDT and GA-RF. However, it costs GA-MLP model the most time to build an estimator,since the structure of GA-MLP contains four hidden layers and hundreds of neurons, which results in tens of thousands of weight coefficients to be adjusted.

4.2. Optimal searching time for the AutoML model

As mentioned in Section 2.3,for ease of control,the time limited to build the AutoML model can be set artificially,which makes it a key factor to affect the model’s performance.Despite that a longer time represents a higher chance of finding better models, it is do necessary to trade off the increment of precision against the extra searching time.Therefore,six specific values,1800 s,3600 s,5400 s,7200 s, 9000 s, and 10,800 s are adopted in this section to investigate the sensitivity of searching time to the performance, and to determine the optimal searching time for the AutoML model.During the process, the experiments were repeated 10 times for each specific searching time, and the prediction results were then gathered and post-processed to calculate the mean and standard deviation of R2.The statistical results of the model’s R2on training and testing sets are summarised in Table 5 in the form of mean ± one standard deviation, where the mean represents the average performance of the AutoML model and the standard deviation is proxy for the stability of the model. For the sake of observation, the results are visualised in Fig. 8. Indicated by the training set line,we can conclude that the average fitting ability of the model remains unchanged by simply increasing the searching time,since the trend line is horizontal.Meanwhile,the fluctuating region of the R2is prone to narrow, meaning that the fitting performance of the model tends to remain stable. However, the exploration ability of the AutoML model is sensitive to the time limit. The trendline is upward, and the model’s extrapolation performance can be effectively improved by adding extra searching time. After taking the two aspects into consideration, 9000 s is recommended as the optimal searching time in this paper to create the AutoML model for further research.

4.3. Performance analysis

4.3.1. Performance of each GA-ML model

Based on the optimal hyper-parameters tabulated in Table 4,each GA-ML model is established for prediction. The MSE and R2referred to in Section 2.4.2 are utilised to evaluate the model’s performance.Results on the training and testing sets are collected and plotted in Fig. 9a and b, respectively. The specific value ismarked on the top of the bar to make the results more intuitive and comparable.

Table 4 Outcome of hyper-parameters tuned in each GA-ML model.

Table 5 Performance of the AutoML model in different searching times.

For the training set, the prediction results of GA-GBDT, GAXGBoost,and GA-MLP all show a high degree of agreement with the true value.The MSE of three models is close to zero while all the R2values approach 1.The performance of GA-LightGBM ranks fourth,followed by GA-SVM and GA-RF. On the whole, all six GA-ML models have grasped the distribution characteristics of the training set and created the appropriate function approximators,since the R2values for all models are greater than 0.95, which is acceptable in practical engineering. Inversely, the generalisation performances of the six models are obviously distinguishing.From Fig.9b,it can be clearly seen that GA-MLP is the top one,with MSE and R2values of 107.8 and 0.8548,respectively.It is because MLP is the foundation of deep learning and has the ability to approximate any nonlinear function. Unfortunately, costing too much time to build the model is a salient drawback for GA-MLP. The generalisation performances of GA-LightGBM and GA-SVM rank second and third, respectively, followed by GA-XGBoost and GA-GBDT with similar MSE and R2. In contrast, owing to the ability of better handling large volume data rather than small samples, overfitting has occurred in the GA-RF model in this study.The testing R2of the model is only 0.6193,and the prediction results are untrustworthy.

4.3.2. Performance of the AutoML model

Fig. 8. Evolutionary process of the AutoML model’s performance relative to searching time.

Before analysis,a voting regressor(VR)model is established for comparison to better demonstrate the superiority of the AutoML model.The idea behind the VR is to combine conceptually different ML models, and then return an average predicted value, which is partially similar to that of the AutoML.Generally,such a model can effectively balance out the weaknesses of individual models,but is hard to be predominant.

In this research, the VR integrates the aforementioned six GAML models. The performances of the VR and AutoML model are plotted in Fig. 9a and b as well to make the results more intuitive and comparable. For the VR model, the training and testing MSE values are 3.05 and 141.1, and the corresponding R2values are 0.9963 and 0.8099,respectively,indicating that its performance lies at the middle level. This phenomenon is well-matched with the philosophy of the VR. In the AutoML model, the R2of the training set is 0.9836,which is slightly inferior to that of GA-MLP,GA-GBDT,GA-XGBoost, GA-LightGBM, and the VR model. Whereas, with the testing R2equalling 0.9106, the AutoML model’s generalisation performance is in the lead among all models. There are definitely enough evidences to believe that the prediction results from the AutoML model are the most trustworthy when new deep excavation parameters are input.

Fig.9. Performance of each model:(a)Training set;and(b)Testing set.VR denotes the voting regressor.

To better demonstrate the powerful prediction ability of the AutoML model, the predicted values are compared with the true values from another perspective in Fig.10. As can been seen from the figure, almost all points are perfectly matched, except for several extreme samples with large values in the testing set. The reason is that the samples with large values are scarce, and the model does not have a complete understanding of these areas during the training process. This phenomenon will rapidly disappear once corresponding data are provided.

In terms of efficiency, compared with the total time needed to search the optimal GA-ML model(as tabulated in the last column in Table 4,a total of 56,310 s),the training time of the AutoML model is first-class and acceptable in reality.

4.4. Further analysis

4.4.1. Error analysis of each model

To further demonstrate the superiority of the AutoML model,we post-process the prediction results obtained from each model,and the absolute percentage of forecast error (APE) is calculated as follows:

In accordance with the acceptable degree in practice,the APE is divided into five grades: 0%-5%, 5%-10%,10%-15%,15%-20%, and greater than 20%. The frequencies of the first four grades in eight models are counted and shown in Fig.11a-h,where the amounts in training and testing sets are represented by the red and blue bars,respectively.

Concretely speaking, with the APE of all 128 sets of data in the training set less than 20%,the fitting abilities of GA-GBDT and GAXGBoost models are exceedingly good. Nonetheless, the performance of the two models on testing set is only better than GA-RF,which indicates that their out-of-sample prediction results are not convincing.In contrast,in robust models,appropriate sacrifices of the training precision are always made to guarantee the model’s testing accuracy. This kind of phenomenon can be observed from the GA-MLP and AutoML models in Fig.11a and h.

Fig.10. Comparison between true value and AutoML predicted value:(a)Training set;and (b) Testing set.

Fig.11. Error analysis of each model: (a) GA-MLP; (b) GA-SVM; (c) GA-RF; (d) GA-GBDT; (e) GA-XGBoost; (f) GA-LightGBM; (g) VR; and (h) AutoML.

To deepen the conclusion, we summarise the relationship between the reliability of a model and its accuracy rate and exhibit a schematic diagram in Fig.12. A high reliability model, also known as a robust model, can always trade off the bias error against the variance error. In other words, the accuracies of both the training and testing sets approach the desired accuracy rate, rather than merely ensuring the former but ignoring the latter. Meanwhile,with approximately four times the modelling efficiency than the GA-MLP model, the AutoML model is the only one that integrates efficiency and reliability in this paper.Accordingly,it is superior to predict the excavation-induced tunnel displacement by this comprehensive model.

4.4.2. Importance analysis of input variables

Despite that seven deep excavation parameters potentially influencing the horizontal displacement of nearby tunnels were selected by Zhang et al. (2020a), they have not delved into the influence of each parameter. As a consequence, an importance analysis (Hapfelmeier et al., 2014) is carried out in this paper to investigate the sensitivity of these factors to the tunnel horizontal displacement.

In theory, the importance of a variable is computed as the normalised total reduction of the criterion induced by that variable,and it is also known as the Gini importance (Breiman et al.,1984).Considering the importance analysis only defined in tree models,the experiment is conducted in GA-LightGBM in this paper, owing to its optimal performance among the same type of models.

Obtained from the GA-LightGBM model, Fig. 13 demonstrates the importance score of each input variable.The sum of importance scores is scaled to 1 for the convenience of analysis. Among all seven input variables,the importance score of Eur/σ′vranks first,1.3 times the score of the runner-up. This result is completely consistent with engineering experience, since Eur/σ′vis a representative index of soil properties.Subsequently,H and B,which represent the spatial characteristics of the deep excavation, rank second and third, respectively, followed by Dhwith the score of 0.106. Finally,the importance scores of su/σ′v,Dv/H,and t are all not greater than 0.031, indicating that the contributions of these variables to the tunnel horizontal displacement are limited, and can even be negligible.

Notably, although the specific importance score of each input variable derived from the importance analysis will vary with tree models, as well as the databases, the importance ranking of variables should remain constant. In view of this, our numerical simulation-based importance analysis of input variables can still guide the projects despite that the characteristic of data may be inconsistent with those obtained through real engineering.To back up our argument,Table 6 tabulates the specific score of each input variable in four tree models. Consistent with the result obtained from GA-LightGBM, the variables Eur/σ′v, H, and B rank in the top three in other three tree models as well. As a consequence, more attention should be paid to these factors when designing the deep excavation. Meanwhile, it is ill-advised to influence the displacement by controlling the values of the variables t,su/σ′v,and Dv/H in practice.

Fig.12. Characteristic and reliability of a model in different stages.

Fig.13. Importance score of each input variable in GA-LightGBM.

4.4.3. Validation of the AutoML model in practical engineering

Although we have proved the superiority of employing the AutoML model to predict the excavation-induced tunnel displacement above, all conclusions are based on numerical simulation data, rather than monitoring data in reality. Consequently, in this section, we aim to test the performance of AutoML model on real projects.Eight engineering cases in Shanghai,China,with complete data records are gathered as validation data. Table 7 lists specific information for each case, including the name and location of the deep excavation, the affected metro line, the value of each input variable, and the tunnel maximal horizontal displacement. The AutoML model trained and tested by the numerical data in Section 4.3 is validated by these real data here.

The forecast results derived from the AutoML model are compared with the true values and exhibited in Fig.14. From the perspective of statistics,the MSE and R2values of validation set are 2.1131 and 0.7956,respectively.In terms of APE,75%of the cases are less than 20%. All these indicators are satisfactory and signify that the model can be applied in real projects.

In addition,the performance of each GA-ML model on validation data is tabulated in Table 8 for comparison. Unfortunately, except GA-SVM and AutoML, the R2values of remaining models are all negative. When further compared to AutoML, the R2value of GA-SVM is relatively small. All these facts indicate the unreliability of GA-ML models on real projects, and thus prove the superiority of our AutoML model.

Table 6 Importance score of each input variable in each tree model.

Table 7 Detailed information of eight deep excavations in Shanghai, China.

Fig.14. Performance of the AutoML model in practical engineering.

Table 8 Prediction performance of each GA-ML model on validation data.

5. Conclusions

In this paper,an AutoML-based method is proposed to precisely predict the excavation-induced tunnel displacement. The 10-fold cross-validation method is utilised to overcome the scarcity of data,and promote the robustness of the model while MSE and R2are selected as two quantificational evaluation indices.In the database,seven features of a deep excavation,i.e.H,B,t,su/σ′v,Eur/σ′v,Dh,and Dv/H, are stored as model’s input variables, whereas the tunnel maximal horizontal displacement δhmis labelled as model’s output target. The entire dataset is divided into two subsets in a 7:3 ratio for training and testing sets, respectively, and all values are scaled before modelling to eliminate the adverse effects of singular samples. In addition, six GA-ML models are built for comparison to highlight the strengths of the AutoML model.

Based on the results of model comparison and analysis, the following conclusions are provided:

(1) Aiming to determine the optimal searching time, parameter analysis is carried out for the AutoML model.The experiment results reveal that the model’s fitting ability basically remains unchanged, whereas extrapolation performance can be effectively improved by adding extra searching time.After trading off the increment of precision against the extra searching time, 9000 s is recommended in this paper.

(2) It is feasible and most reliable to predict the excavationinduced tunnel displacement by the proposed AutoML model. With all evaluation indicators ranking first in the testing set, the extrapolation ability of the AutoML model is optimal among all models.As a consequence,the prediction results derived from the AutoML model are the most trustworthy when brand new deep excavation data are imported.

(3) Through error analysis,the GA-MLP and AutoML are the only two robust models that can trade off the bias error against the variance error. Whereas, the modelling efficiency of the latter is four times that of the former, indicating that the AutoML model is the best choice in reality.

(4) Importance analysis in tree models unanimously indicate that the change of soil property index Eur/σ′v, and the spatial characteristics of a deep excavation, H and B, can have a significant influence on the horizontal displacement of a tunnel nearby. Inversely, the influences of t, su/σ′v, and Dv/H can be negligible since their importance scores are fairly small.

(5) Eight engineering cases in Shanghai, China are collected to validate the AutoML model.The predicted results are in good agreement with the monitoring data, signifying that the model can be applied in real projects.

There leaves some room for further improvements in our work.Firstly,the meta-learning submodule in AutoML is not employed in this study, on account of the lack of prior databases. Further works should be conducted to gather more similarity data and to build a database that can provide prior experience. Secondly, owing to the lack of enough data from practical excavation projects, the AutoML model in this study is trained by a simulation database and validated by eight real projects. The next step is to standardise the format of the engineering data records so as to conveniently gather much more real data for modelling and better guide the actual engineering.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (Grant Nos.51978517, 52090082, and 52108381), Innovation Program of Shanghai Municipal Education Commission (Grant No. 2019-01-07-00-07-456 E00051), and Shanghai Science and Technology Committee Program(Grant Nos.21DZ1200601 and 20DZ1201404).