A real-time prediction method for tunnel boring machine cutter-head torque using bidirectional long short-term memory networks optimized by multi-algorithm

2022-08-24 16:57XingHungQuntiZhngQunshengLiuXueweiLiuBinLiuJunjieWngXinYin

Xing Hung, Qunti Zhng, Qunsheng Liu,*, Xuewei Liu,**, Bin Liu, Junjie Wng,Xin Yin

a State Key Laboratory of Geomechanics and Geotechnical Engineering, Institute of Rock and Soil Mechanics,Chinese Academy of Sciences, Wuhan,430071,China

b Key Laboratory of Geotechnical and Structural Engineering Safety of Hubei Province, School of Civil Engineering, Wuhan University, Wuhan, 430072, China

c The 2nd Engineering Company of China Railway 12th Bureau Group, Taiyuan, 030032, China

Keywords:Tunnel boring machine (TBM)Real-time cutter-head torque prediction Bidirectional long short-term memory(BLSTM)Bayesian optimization Multi-algorithm fusion optimization Incremental learning

A B S T R A C T

1. Introduction

The rapid development of highway, railway, and inter-basin water diversion projects requires the construction of numerous deep and long tunnels(Liu et al.,2021;Wang et al.,2021).The use of full-face tunnel boring machines (TBMs) has become the first choice and key development trend for the construction of these tunnels owing to their advantages of high efficiency, economic operation, environmental protection, and small excavation disturbance(Huang et al.,2018a,b;Ma et al.,2020).However,due to the complexity and changeability of strata in the tunneling process,both the selection and adjustment of tunneling parameters still heavily rely on experiences.These issues easily lead to a mismatch between TBM tunneling parameters and geological conditions,resulting in low rock cutting efficiency, severe cutter wear,abnormal cutter-head damage,fracture of main bearings,and even machine jamming or complete destruction. Cutter-head torque is one of the critical parameters, as it provides the required cutterhead cutting force for rock cutting. It exerts an important influence on rock fragmentation efficiency and rock-machine interaction.Optimizing cutter-head torque for maintaining torque stability has been shown to have a positive effect on the cutter-head loading state and driving motor operation, which is conducive to accelerating the construction rate and reducing mechanical loss and construction costs.The prediction of cutter-head torque is of great significance for avoiding cutter-head entrapment and guiding adaptive adjustment of TBM advancing parameters in time.

Generally, TBM performance models, including cutter-head torque prediction models, can be divided into three categories from the perspective of utilized modeling methods. The first category is the empirical and theoretical models based on laboratory testing,field monitoring of TBM’s performance,and rock properties(Yagiz,2008).This category includes the commonly used Colorado School of Mines (CSM) model (Ozdemir, 1977; Rostami, 1997),Norwegian University of Science and Technology (NTNU) model(Bruland,1998), rock mass rating (RMR) system (Bieniawski et al.,2006; Hamidi et al., 2010), andQTBMmodels (Barton, 2000).However, these models do not consider different tunneling conditions or rock mass properties,and only consider limited parameters(Benardos and Kaliampakos, 2004; Yagiz and Karahan, 2015). The second category is the mathematical models based on conventional statistical regression analysis(Gong and Zhao,2009;Farrokh et al.,2012; Mahdevari et al., 2014; Yang et al., 2019, 2020). Zhang et al.(2005) developed mathematical models for predicting cutterhead torque and thrust force, earth cabin pressure, and cutterhead rotational velocity by statistical regression analysis based on parameters collected by the TBM. Zhou et al. (2019) established a cutter-head torque prediction model based on the mechanical structure parameters of TBMs (i.e. shield length, cutter-head diameter, and cutter-head plate thickness), the controlling operating parameters(i.e.advance rate,cutter-head rotational velocity,and earth cabin pressure),and the tunneling formation parameters(i.e. tunnel burial depth, Young’s modulus, shear modulus, volumetric weight, and lateral pressure coefficient of the soil) using statistical regression analysis. However, these statistical models normally make assumptions and/or approximations on the relationship between two variables and use specific mathematical functions to describe their relationship. They are not always sufficiently robust to accurately capture complex and nonlinear conditions, especially when encountering extreme data. The third category is the black-box models developed based on artificial intelligence (AI) and machine learning (ML) techniques (Simoes and Kim, 2006; Xu et al., 2019; Koopialipoor et al., 2020; Zhou et al.,2021). Recently, many soft computing methods have been employed in underground excavation to learn laws or patterns from the obtained field data or case histories in an implicit form(Zhang et al., 2019c, d, 2020b). These methods are used in prediction of ground surface settlement(Goh and Hefney,2010;Goh et al.,2018),analysis of tunneling stability(Zhang and Goh, 2017; Zheng et al., 2021), inversion of surrounding rock mass (Liu et al., 2019,2020a,b;Hou et al.,2021),and assessment of rockburst in hard rock(Yin et al.,2021)and squeezing deformation in soft rock(Ghasemi and Gholizadeh, 2019; Hasanpour et al., 2020). For example, Zhou et al. (2021) developed six hybrid TBM penetration-rate predicting models based on extreme gradient boosting which are optimized by gray wolf optimization, particle swarm optimization,social spider optimization, sine cosine algorithm, multi verse optimization, and moth flame optimization, respectively. Concerning cutter-head torque prediction,the AI-and ML-based cutterhead torque prediction models can be mainly divided into two types in terms of input parameters:one type uses some formation parameters and some TBM operating parameters as input parameters; the other only uses TBM operating parameters as input parameters.However,as current detection methods are insufficient to provide a continuous real-time perception of formation parameters(Zhang et al., 2019a), models that use formation parameters as inputs cannot predict torque in real-time. Therefore, a cutter-head torque prediction model established using the machine parameters to guide the adjustment of tunneling parameters has more practical significance. Li (2018) developed a cutter-head torque prediction model using a back propagation neural network with cutter-head rotational velocity, advance rate, water content, saturated uniaxial compressive strength,and surrounding rock class as inputs.Wang et al.(2018)established a prediction model of cutterhead torque using nonlinear support vector regression with surrounding rock type, cutter-head rotational velocity, and advance rate as input parameters.Li(2019)used 54 parameters collected by a TBM as inputs and established a cutter-head torque prediction model using regression tree (RT) and random forest (RF) algorithms.Paulo and Adam,2019 proposed a torque prediction model applying a feedforward artificial neural network and using advance rate, thrust force, screw conveyor torque, and foam injection ratio as inputs. The models developed based on AI and ML approaches with massive data samples normally outperform conventional empirical, theoretical, and statistical regression models, which enable making more reliable and precise predictions under highly complex conditions.

TBM cutter-head torque is a time-varying parameter. The previous torque affects the value in the next time step, and behaves according to time-series characteristics in actual operation. However,the above-mentioned AI-based cutter-head torque prediction models using RT, RF, support vector machine (SVM), and multivariate adaptive regression splines are static models (Zhou et al.,2016; Yang et al., 2019; Gao et al., 2019). These cannot learn the correction relationship between adjacent time steps,and thus they cannot make full use of tunneling parameters at preceding time steps. They usually perform predictions using the inputs at the current time. Consequently, the developed model cannot predict the next-moment cutter-head torque in real time. As a result, it is difficult to guide the adaptive adjustment of tunneling parameters in time. To overcome this shortcoming, models using dynamic algorithms such as recurrent neural network (RNN) (Pascanu et al.,2013) and its variants have emerged (e.g. long short-term memory (LSTM) (Hochreiter and Schmidhuber,1997) and gated recurrent unit networks (Cho et al., 2014)). For instance, Li et al. (2020)introduced an LSTM network for TBM performance prediction with which the importance of input parameters can be assessed by applying an RF model.Another problem is that many AI models still use a single algorithm, which limits their reliability and accuracy.Therefore, dynamic and hybrid models that fuse multiple optimization algorithms must be developed to learn from the historical operational information of TBM to forecast the cutter-head torque in real-time.

According to the above literature review and a large set of TBM operational parameters collected from the Songhua River Water Conveyance Tunnel in Jilin Province in China,a complete real-time TBM cutter-head torque prediction method is proposed based on bidirectional long short-term memory(BLSTM) network (Schuster and Paliwal,1997). The constructed dynamic model can learn the correlation of previous information and thereafter has real-time prediction capacity, which is superior to the conventional static model. Furthermore, the model fuses many types of optimization algorithms(including the Bayesian,early stopping,and checkpoint algorithms),surpassing conventional ones with single algorithm or integration of simple algorithms. Consequently, this method significantly improves both the reliability and accuracy of prediction. Finally, an incremental learning model is proposed based on the developed base model to improve the model’s generalization ability during TBM tunneling. The presented model provides a theoretical basis for making adaptive adjustments of tunneling parameters and cutter-head torque optimization decisions.

2. Selection of feature variables

2.1. Project overview

In the Songhua River Water Diversion Project,water is diverted from the Fengman Reservoir in the east of Jilin Province(China)to Changchun and Siping in the middle.The main tunnel is 110 km in length, and section TBM3 has a total length of 22.955 km (from Chainage 71 +855 m to 48 + 900 m) (Fig.1). Section TBM3 was mainly constructed by the TBM tunneling method (20,198 m) in combination with the drilling-and-blasting method(2757 m)with a maximum burial depth of 260 m (Fig. 2).

Fig. 2. Layout of section TBM3 in the Songhua River Water Diversion Project.

The cutter-head boring diameter is 7.93 m. The cutter-head is equipped with 55 disc-cutters. The rotational velocity range is 0-7.6 r/min, the rated torque is 8420 kN m (under the cutter-head rotational velocity of 3.97 r/min), and the maximum total thrust is 23,060 kN.

The rock lithology in section TBM3 mainly consists of granite and limestone. The surrounding rock was classified using the hydropower classification method(GB50487-2008,2009),which is a specialized engineering geological classification system that is widely used in water conveyance tunnels, especially in China.Classes II, III, IV, and V rock masses account for 5.26%, 67.88%,22.47%,and 4.39%,respectively,of the rock formation in the specific TBM section.

A portion of section TBM3 made up of 1110 m length of tunneling section (from Chainage 70+120 m to 68 + 990 m)was chosen for data acquisition to provide input parameters for the model training.This segment,1381 m away from the TBM3 section starting point, is where TBM operates from the trial tunneling to stable driving stages. The surrounding rock contains Classes II, III,IV,and V rock masses(Table 1)of 5.41%,65.41%,24.41%,and 4.77%,respectively, which approaches the percentages of the four classes comprising the whole tunnel.

Table 1 Statistics of surrounding rock classes from Chainage 70+120 m to 68 + 990 m.

Fig.1. Location of the studied tunnel (Liu et al., 2020b).

2.2. Data preprocessing

The TBM in section TBM3 recorded a total of 199 types of data.Parameters not related to the cutter-head torque, such as time,chainage number, and tunneling parameter settings, were removed.The collection frequency of TBM data was 1 Hz,and about 86,000 data entries were collected every day.Considerable amount of invalid data was also generated between two adjacent tunneling cycles.According to the features that value of tunneling parameters must be larger than zero during tunneling, the tunneling state discrimination functions (Eqs. (1)-(3)) are presented to eliminate sensor anomalies and invalid data between two adjacent tunneling cycles.

wherevcis the cutter-head rotational velocity,Wis the cutter-head power,Pis the cutter-head pressure,λ is the penetration rate,Fis the total thrust force,vsis the advance rate,Isumis the sum of motor current,Tsumis the sum of motor torque,Wsumis the sum of motor power, andTis the cutter-head torque.

The field penetration index (FPI) of rock mass represents the thrust required for a single cutter per unit penetration rate,and the torque penetration index(TPI)of rock mass represents the required circumferential frictional resistance and cutting force of the interaction between a single cutter and the rock mass per unit penetration rate (Zhang et al., 2019b). Both can reflect the rock mass conditions during TBM tunneling to a certain extent.FPIandTPIcan be calculated by

whereNis the number of cutters installed on the cutter-head,andDis the diameter of TBM.

FPIandTPIwere calculated based on the above data set after eliminating invalid data and parameters that are clearly irrelevant to the cutter-head torque.Then,input parameters were selected by incorporating the calculatedFPIandTPIinto the sample set. Each characteristic parameter in the sample set was normalized using Eq.(6)to eliminate adverse influences of different dimensions and orders of magnitude of the feature parameters on feature selection and model training.

wherexis the input feature parameter set,iis the serial number of the feature parameter, andnis the number of feature samples.

According to the above data preprocessing,1,230,725 entries of valid data were extracted from the 1110 m length of tunneling section (Chainage 70 + 120-68 + 990 m), which constitute the training database.

2.3. Feature selection method based on the SelectKBest algorithm

The SelectKBest algorithm is a univariate feature selection algorithm in the Python ‘Scikit-Learn’ library, which calculates the test values of a single feature and output parameter in turn(Pedregosa et al., 2011). The SelectKBest algorithm selectsKfeatures with the highest scores. In comparison with multivariate feature selection methods, such as the recursive elimination and principal component analysis methods, the physical meaning of SelectKBest becomes apparent, and the selection results are more explanatory.Thus,this algorithm is widely used in ML.Two inputs are used in the SelectKBest algorithm(i.e.feature selection method and the set feature variable numberK).In this paper,f_regression is used as a feature selection method. This algorithm first calculates the sample correlation coefficientribetween thei-th feature in the data set and the target variableyusing Eq.(7);then the test valuefof each feature is calculated using Eq.(8).The higher thefscore,the closer the correlation between thei-th feature and the target variabley. Therefore,kfeatures with the highest score are chosen as input parameters.

whereiis the serial number of the feature parameter,xis the input feature parameter set,yis the target variable,nis the number of feature samples,[]is an array,and‘:’means then_feature samples.

The features are selected using the SelectKBest algorithm based on the preprocessed data set. According to their scores in descending order,the top ten features are:Tsum,W,Wsum,Isum,vs,P,F, λ,vc, andFPI(Table 2). Ten variable-frequency motors drive the TBM cutter-head in section TBM3,andTsum,W,Wsum,andIsumof the motors increase with increasing cutter-head torque. Li (2019)indicated thatvs,F,P, and λ exert significant influences on the cutter-head torque.Wang et al.(2018)showed that the cutter-head torque has an apparent positive correlation withvcandvs. Delisio and Zhao (2014) demonstrated thatFPIis closely related to rock mass parameters,including uniaxial compressive strength and the number of volumetric joints, which can reflect the rock mass resistance capacity of cutter penetration and rock mass quality to a certain extent. The sum of thefvalues of the above ten features is larger than 95%of the sum of thefvalues of all features.Thus, the feature variables selected using the SelectKBest algorithm are reasonable and can thus be used as input parameters for the realtime cutter-head torque prediction model. A data set is established based on the above selected ten feature parameters and target variableT.

To better present the established data set, the statistical information on both input and output parameters with their units,categories,and descriptions is listed in Table 3.

Table 2 The top ten feature variables with the highest scores selected using the SelectKBest algorithm.

3. Structure of the real-time cutter-head torque prediction model

The evolution of TBM tunneling parameters is a complex nonlinear dynamic process.The influencing factors and operational parameters at one moment will influence the advance parameters at the next time step.In fact,the prediction of tunneling parameters is a time-series related regression problem.

Conventional SVM,support vector regression(SVR),and RF can use the information at current or historical time steps to perform predictions at the next time step. For instance, Sun et al. (2018)developed the RF model to predict TBM load. Mahdevari et al.(2014) and Tao et al. (2015) applied SVR and RF models to predicting TBM penetration rates in hard-rock mass, respectively.However, SVR and RF models are static models, rendering them inadequate to make full use of the previous information,especially the correlation between time steps (Zhou et al., 2016; Yang et al.,2019; Gao et al., 2019).

RNN can remember historical information and apply it to the current output,which leads to the achievement of state feedback in networks(Han et al.,2004).LSTM is an improved variation of RNN for capturing the long-temporal dependency for the input sequence(Hochreiter and Schmidhuber,1997). LSTM has a particular memory block consisting of input gate,output gate,forget gate,and cell state (Fig. 3). The gate operation added by LSTM can extract the correlation information of parameters at different moments in a continuous-time series and can be used to predict the output parameters at future moments. The gates and cell state of LSTM are updated using Eqs. (9)-(14).

where σ represents the activation function;Wi,Wa,Wo,Wf,Ui,Ua,Uo,andUfrepresent the weight matrices;bi,ba,bo,andbfrepresent the bias matrices; ⊙represents the Hadamard product;tis the current moment;htis the current moment unit output;itis the input threshold;atis state of the cell;otis the output threshold;ftis the forget threshold; andCtis the current cell state.

With the help of the gates in the memory block,the LSTM or its variants can assess the rules learned in the previous time step and judge whether these learned rules are useful or not; then, it can determine whether to discard invalid information or remember the useful information and pass it onto the next neuron. In LSTM models, nodes in hidden layers relate to others at different time steps.The connections to the previous time step are set as inputs,so that the hidden state contains a dynamic sequence of input features. The LSTM model can learn the connections between the input parameters at previous time steps. The model can also learn rules from a series that contains historical information and thus can make full use of this information.However,the SVM and RF models,with no connections between different time steps, can only learn rules from the input values at each moment, while the sequential response between outputs and inputs cannot be learned (Xu and Niu, 2018). For the static SVM and RF models, all rules learned from earlier information will be used to predict the next stage throughout the whole training process,unlike LSTM-based models,which discard unnecessary rules and only remember valuable rules. Furthermore, changing the sequence of inputs at different time steps of the static models will not change the model.However,when the time sequence inputs of LSTM- or BLSTM-based model change, the training results may also change. LSTM-based models are dynamic models,which can capture the relationships between different time points. These characteristics of the LSTM model contribute to generating a more accurate and reliable cutter-head torque prediction.

Schuster and Paliwal (1997) proposed BLSTM algorithm as a mutation algorithm of LSTM. The BLSTM algorithm adds a backward-sequence layer to the forward-sequence layer based on LSTM, extracting more comprehensive information from input variables for predicting and developing more robust models.Compared with RT, RF, and SVM, BLSTM can make full use of previous time-series information to predict the target at the next timepoint. The real-time prediction ability of BLSTM is significant in application to early warning of geo-hazards and adaptive adjustment of operating parameters in TBM tunneling.Therefore,BLSTM is employed to create the real-time cutter-head torque prediction model.Fig.4 shows the calculation process of the BLSTM algorithm,which takes the stitching outputs of the forward-sequence,and the backward-sequence layer as output.

Overfitting is a common problem in neural network models,which reduces their generalization ability.Hinton et al.(2012)and Srivastava et al.(2014)proposed to apply the dropout algorithm to the model for randomly invalidating a certain proportion of neurons in each training cycle(Fig.5)so that the neurons participating in the training are not completely consistent each time.The update of weights no longer depends on the neurons’joint role with fixed relations,thus avoiding the situation that certain features only take effect under other specific features, which effectively overcomes the overfitting problem.Based on the above principles,a structure for real-time cutter-head torque prediction model is established(Fig. 6).

At present, the commonly used optimizers in neural network training include stochastic gradient descent (SGD), standard momentum optimization (Momentum) algorithm, and root mean square propagation (RMSProp) algorithm (Zhang and Goh, 2016).SGD has a fast calculation speed, but it simultaneously introduces noise that easily leads to the weight update becoming trapped in local optimum.The Momentum algorithm introduces a momentum recording historical gradient information based on SGD,which can accelerate gradient descending. The RMSProp algorithm extracts a root when updating the weight,which decreases fluctuations in the large gradient direction and accelerates the gradient descending.Furthermore, RMSProp is suitable for processing non-stationary targets, as well as optimizing RNN and its variants. Therefore, this paper develops the BLSTM-based real-time cutter-head torque prediction model using the RMSProp optimizer.

4. Development of cutter-head torque prediction model

The creation and validation process of the proposed cutter-head torque prediction model is illustrated in Fig. 7.

Fig. 3. Structural diagram of an LSTM network neuron (adapted from Hochreiter and Schmidhuber,1997; Ebrahimi et al., 2020).

Fig. 4. Calculation process of the BLSTM algorithm (adapted from Schuster and Paliwal,1997; De Mulder et al., 2015).

4.1. Establishing the training set

The measured cutter-head torque fluctuates in the stable driving stage, with a fluctuation period of about 5 s in a rising or falling cycle. Excess time-series output will result in difficulties in modeling convergence and will reduce prediction accuracy.Therefore,a fluctuation period(i.e.a time step of 5 s)is selected as the output time series to achieve real-time prediction of cutterhead torque and its variation at the same time. The real-time prediction ability of the BLSTM model depends on the continuity of input and output parameters in the time series. If the input time series is too small, the information extracted from the model is insufficient. As a result, the prediction accuracy decreases. If the input time series is too large,the continuity of parameters far away from the output time series will be weakened, and the model is difficult to converge. Therefore, the torque fluctuation period is taken as interval,and the 2-12 torque fluctuation cycles(i.e.2-12 times of the output time series length) are taken as input time series to test the mean-square error(MSE)and mean absolute error(MAE) of the training set (Eqs. (15) and (16)). Then the loss valueval_MSEand mean absolute errorval_MAEof the validation set are tested (Table 4). The smaller the four indicesMSE,val_MSE,MAE,andval_MAE,the better the model performance(a smaller value of these four indices indicates a lower ranking). In summary, the prediction performance with the input time series of six torque fluctuation cycles (i.e. 30 steps) is better;therefore, six fluctuation cycles(i.e.time step of 30 s)are selected as input time series.Based on the above analysis and the data set presented in Section 2, the input and output sample sets are established to constitute the training set, containing a total of 1,230,696 training samples.

Table 3 Summary of data set.

Table 4 Comparison of loss values and MAE between the training set and validation set at different input time steps.

whereyirepresents the actual value and ︿yirepresents the prediction value.

4.2. Hyperparameter optimization

Hyperparameters exert a significant influence on the performance of the neural network model.The model’s hyperparameters presented in Section 2 include the number (m) of BLSTM neurons,the learning rate (lr) of the RMSProp optimizer, and the random failure probability(dp)of the dropout layer.The number of neurons in the hidden layer significantly influences both the learning ability and learning speed of the model (Zhang et al., 2020a).dpdetermines the effect of the dropout algorithm on preventing model from overfitting (Goodfellow et al., 2016). Therefore, the hyperparameterm,lr, anddpare optimized in this paper.

At present, the commonly used optimization algorithms for hyperparameter include grid search, random search, and Bayesian optimization (Cui and Yang, 2018). The grid search algorithm searches for the optimal combination among given hyperparameters, but the optimized results are greatly affected by human factors. The random search algorithm makes a random combination of hyperparameters and selects a group of the hyperparameters with the best performance. Its optimized results have large randomness.The Bayesian optimization algorithm is an adaptive method for hyperparameter search, which establishes an agent function for optimized hyperparameters and optimization objectives according to the combination of tested hyperparameters.With this agent function, optimization goals are judged for development toward a promising direction and for guiding the selection of new hyperparameter combinations. The tested hyperparameter combination will join the historical database to continually modify the agent function, which makes the mapping relationship between the hyperparameter combination and the optimization goal more accurate.Based on the above comparison,this paper employs Bayesian optimization. It should be noted that the increasing number of optimized hyperparameters significantly increases the space and time for optimization. Therefore, the number of optimized hyperparameters is recommended to be no more than three.

Fig.5. Principle of the dropout algorithm(adapted from Hinton et al.,2012;Srivastava et al., 2014).

TheK-fold cross-validation algorithm divides the training set intoKparts.K-1 pieces of data are used in each training cycle,while the remaining one is used to test the model.In this way,the model is trained and tested forKcycles. The average of the obtainedKscores is treated as the final score of the model (Fig. 8). This algorithm can effectively reduce the adverse influence of the division occasionality of testing set on the evaluation of model performance.Therefore, based on the training set established in Section 4.1 and takingMSEas optimization objective,the fusion of Bayes and 5-fold cross-validation algorithms is used to optimizem,lr,anddp(Fig.9).The optimization times of the hyperparameter(I)are set as 100;the optimization range ofmas[1,20];the optimization range oflranddpas 0 and 1, respectively; and the obtained optimal hyperparameter combination ofm,lr, anddpas 15, 0.004128, and 0.113819, respectively, which is used for all models in this paper.The loss function of the training process is set asMSEand the evaluation index of the validation set isMAE(Figs. 7 and 9).

Fig. 6. The BLSTM-based real-time cutter-head torque prediction model.

4.3. Multi-algorithm fusion and optimization for prediction model training

The number of training rounds exerts an essential effect on the prediction ability of the model. Training one cycle for all training samples is called a round,i.e.‘an epoch’.Inadequate training rounds will result in low learning and prediction capacities, while too many rounds of training will lead to overfitting.

Considering the above problems,this paper integrates the early stopping and checkpoint algorithms in model training to optimize the training process(Fig.10).The checkpoint algorithm divides the sample set into training and validation sets at the beginning of each epoch. The training set is used to train the model, and then the validation set is employed to verify its prediction capacityPepoch.Simultaneously, the validation set is used to verify the prediction capacity of the model before the current training epoch.The larger the indexPepoch, the better the prediction performance. IfPepoch>Pepoch-1,then,the previously saved model will be replaced by the new model trained in the current epoch. Otherwise, the saved model remains to be the same.The early stopping algorithm checks the differencepbetween the current training round and the training round of the previously saved model using checkpoint algorithm. Ifpis greater thanPat(Patience) given in the early stopping algorithm, then training is stopped. The model saved by the checkpoint algorithm is the optimal model,which can be called the ‘BO-BLSTM cutter-head torque prediction model’.

Fig. 7. Flowchart of the proposed cutter-head torque prediction model.

The proposed training process based on multi-algorithm fusion and optimization is used to train the model.The loss function of the training process is set asMSE, and the evaluation index of the prediction performance of the validation set asMAE(Fig.10).According to the experiences described in the literature (Yagiz and Karahan,2015; Armaghani et al., 2019; Zeng et al., 2021), the loss value andMAEof the training and validation sets can reach a stable state within the first rounds of training.After that,the loss value andMAEof the validation set fluctuate,and many minimum values arise.The tolerance parameter is used to test whether there is another minimum value in a specific training round after the minimum value appears in the validation set. Suppose the minimum value of theMAEon the validation set is still minimum in the next 200 rounds of training,then the corresponding model of the minimum value in the limited training rounds can be considered as the optimal model(Subasi,2020).Therefore,the tolerance is set as 200.The function of the maximum training round parameter is to stop training when the loss value of the model training set has reached a stable level but cannot meet the requirements of the tolerance parameter. The largest training round (set as 1000) is sufficient to ensure that the loss value of the training set can achieve stability.

Fig. 8. Principle of the K-fold cross-validation algorithm (Fushik, 2009).

Fig. 9. Optimization algorithm for hyperparameter.

The variations of the training set’s loss value (MSE), validation set’s loss value (val_MSE), training set’sMAE, and validation set’sMAE(val_MAE) in the training process are obtained (Fig. 11). The final number of training rounds is 268,indicating that the optimal model is obtained when the number of training rounds reaches 68.After training for 68 rounds, theMSEandMAEon the training set still slightly decline overall, but theval_MSEandval_MAEgreatly fluctuated. However,val_MSEandval_MAEare always larger than those after training for 68 rounds, indicating that a certain degree of over-fitting is generated in the models after training for 68 rounds. Thus, it is reasonable to terminate training after 268 rounds,which is far from the maximum number(1000)of training grounds. The model in this section is trained by the sample set in the initial tunneling section referred to as the ‘base model’. The base model is continuously optimized and adjusted with new data obtained by the forward tunneling, which is referred to as the ‘incremental learning model’.

Fig.10. Training process of multi-algorithm fusion and optimization.

4.4. Model testing

The model developed in Section 4.2 was used to test the 102,041 cutter-head torque samples (testing set) within the Chainage 68 + 990-68 + 890 m, and theMSE,MAE, mean absolute percentage error(MAPE),coefficient of determination(R2),correlation coefficient (r), and explain variance score (EVS) were employed to evaluate its prediction performance (Eqs. (15)-(20), Fig. 7, and Table 5). A comparison between a part of the measured and predicted results is shown in Fig.12,and the statistical analysis of the measured and predicted values is shown in Fig.13. The statistical information in the testing process, including the mean, standard deviation, maximum, minimum, and median of the predicted and actual values, is illustrated in a violin plot shown in Fig.13b. The statistical distribution of prediction error is shown in Table 6. The relative error of real-time prediction for 94.33%of samples is within 10%, and most of them are below 5% (Table 5 and Fig. 13). The decision coefficient and correlation coefficient between measured and predicted results exceeded 0.95, and the distribution of the predicted results is close to that of the actual data. These results show that the BLSTM-based neural network model has excellent performance and is suitable for real-time cutter-head torque prediction.andnrepresent the mean actual value,mean predicted value, and the number of samples,respectively.

Fig.11. Variations of MSE and MAE during training.

Table 5 Model evaluation score.

Fig.12. Comparison of part of measured and predicted results.

Fig. 13. Statistical analysis of measured and predicted results for (a) correlation analysis, and (b) violin plots and statistical information.

Table 6 Statistics of prediction error.

where

5. Discussion

5.1. Incremental learning model

With increasing advance distance, more and more tunneling conditions will be encountered, which is reflected in the recorded TBM tunneling data. A continuous tunneling mileage has similar operating conditions.Forecasting for new operating conditions relies on the learning of similar conditions.The incremental learning method only needs to learn the new data in the sample set based on the existing model, which equips the current model with the capability to predict new operating conditions and without the need to learn all samples, thus greatly shortening the learning time.Therefore,in this study,200 m length of tunneling sections is chosen fromClasses II,III,IV,and V of surrounding rocks.The new data in the first 100 m is used for incremental learning based on the aforementioned base model,and the data in the second 100 m is used to test the prediction capacity of the incremental learning model(Table 2). Then, the measured and predicted values are compared(Fig.14).All measured and predicted values are statistically analyzed(Fig.15).Table 7 shows that the prediction error of the base model for all classes of surrounding rock is large,andtheMAPEof prediction for Classes IV and V surrounding rock reaches 42%. TheMAPEof the predicted value using the incremental learning model in all classes surrounding rock is less than 10%. Furthermore, the coefficient of determination and correlation coefficient of the measured and predicted values all exceed 0.9, and the violin plot (Fig.15) shows that the distribution of the predicted result is close to that of the actual value, indicating that the incremental learning method can greatly improve the prediction accuracy.

Table 7 Comparison of the model’s prediction performance before and after incremental learning.

Fig.14. Comparison of measured and predicted cutter-head torque values using the base model and incremental learning model for(a)Class II,(b)Class III,(c)Class IV,and(d)Class V surrounding rocks.

Figs.13 and 15 show that the predicted values are much larger than the actual values when the measured value ranges from 0 kN m to 500 kN m.There are two main reasons for this.The first reason is that data of the whole tunneling cycle are used(i.e.startup and stable tunneling stages). However, the data samples at the start-up stage are much fewer than those at the stable tunneling stage (Fig. 16); therefore, the proportion of the samples is not balanced, leading to a relatively large error at the start-up stage.The second reason is that the proposed cutter-head torque prediction model is developed based on BLSTM, which uses previous tunneling information to predict the cutter-head at the next time step.Therefore,when the TBM stops operating,if this BLSTM-based model is used, the cutter-head torque in the next 5 s is still predicted, which makes the predicted cutter-head torque almost as high as that during the steady tunneling stage.However,the actual cutter-head torque is near zero, thus several large predicted value points emerge(i.e.high predicted value versus low actual value)in Figs.13a and 15. If this model is employed in field, the prediction process can be terminated when the TBM stops tunneling.

Fig.15. Statistical analyses of the measured and predicted cutter-head torque values using incremental learning model for(a,b)Class II,(c,d)Class III,(e,f)Class IV,and(g,h)Class V surrounding rocks.

Fig.16. Partition of the start-up and stable tunneling phases.

5.2. Advantages of this cutter-head torque prediction method

Bayesian optimization algorithm can automatically and simultaneously optimize multi-continuous or discontinuous hyperparameters, but with the increasing number of optimization hyperparameters,the complexity of the established agent function increases exponentially. The hyperparametric combination tests,which are required to describe the agent function as accurately as possible to find the optimal global parameters, will also increase exponentially. In response, the required hyperparametric optimization time will grow explosively. Therefore, limited by the computing capacity of currently available computers, the most critical parameter should be selected as far as possible to obtain the optimization hyperparameters within an acceptable time. This study suggests that no more than three hyperparameters should be adopted for Bayesian optimization.

When using Bayesian optimization algorithm to optimize hyperparameters, certain parameters also need to be set, such as the maximum number of optimization attempts. Such parameters can be called‘super-hyperparameters’.Apparently,the small superhyperparametric setting is not sufficient to accurately describe the agent function.It is therefore difficult to identify the global optimal parameter.When too many super-hyperparameters are set,it takes so long for optimization thus exerting an essential impact on the optimization of hyperparameters. However, there is no literature suggesting reasonable setting of super-hyperparameters. In future studies,the method for reasonable hyperparameters setting will be explored based on the number of optimized hyperparameters and the type of neural network.

During the model testing shown in Sections 4.4 and 4.5,samples with smaller torque have a larger prediction error.The reason may be that the sample distribution is unbalanced because of less number of samples with small torques in the training set.In model training,less attention is paid to samples with small torques, which exacerbates the problem of insufficient learning.Therefore,if torque is small, the real-time prediction error is large. In the future, more training samples with small torque for oversampling or resampling will be employed to balance the sample distribution.

In addition to the cutter-head torque in the TBM tunneling process, there are many other TBM controlling tunneling parameters,such as the cutter-head thrust, penetration rate, and cutter-head rotational velocity. Although the characteristics of the tunneling parameters are different, they are closely related to other TBM operational parameters.Thus,the proposed prediction method can also be applied to the real-time prediction of other TBM parameters.

6. Conclusions

Based on data from the Jilin Songhua River Water Diversion Project,a real-time prediction method for TBM cutter-head torque based on the fusion of BLSTM and multiple optimization algorithms is proposed. The main conclusions are summarized as follows:

(1) To predict the TBM cutter-head torque,1,230,725 entries of valid data (from Chainage 70 +120-68 + 990 m) are used,and data preprocessing method and feature selection method based on the SelectKBest algorithm are proposed.Ordered by influence, the top ten parameters areTsum,W,Wsum,Isum,vs,P,F,λ,vc,andFPI,which are most related to the cutter-head torque as input parameters.

(2) A real-time cutter-head torque prediction model is proposed based on BLSTM, and a method for hyperparameter optimization based on the Bayesian optimization and crossvalidation algorithms is put forward. A training model based on fusion of early stopping and checkpoint algorithms is proposed,preventing model from overfitting and reducing training time. The presented BO-BLSTM prediction model can make full use of the previous time-series information to predict the cutter-head at the next time step.

(3) The relative errors of the predicted values are all less than 10%, and most of them are less than 5%. The coefficient of determination and correlation coefficient are all above 0.9,and the statistical distribution of the predicted results is close to that of the actual data, indicating that the BLSTMbased multi-algorithm fusion model enables reasonable real-time cutter-head torque prediction.

(4) For improving adaptability of the model,as the TBM tunnels forward,an incremental learning method is proposed based on the base model. Comparing the real-time cutter-head torque prediction results of the same tunneling section between the base and incremental learning models shows that the incremental learning method can greatly improve both the prediction accuracy and generalization capacity in all classes of surrounding rocks.

The presented real-time prediction method of TBM cutter-head torque in this paper is of great significance to both intelligent decision-making and dynamic adjustment of TBM tunneling parameters.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was financially supported by the National Natural Science Foundation of China (Grant Nos. 52074258, 41941018, and U21A20153). Their supports are gratefully acknowledged. The authors also thank the China Railway Engineering Equipment Group Co., Ltd., and China Railway Tunnel Group Co., Ltd., for their cooperation with in situ data collection.