Hadi Fathipour-Azar
Faculty of Mining Engineering, Petroleum and Geophysics, Shahrood University of Technology, Shahrood, Iran
Keywords:Joint roughness coefficient (JRC)Statistical parameters Gaussian process (GP)K-star Random forest (RF)Extreme gradient boosting (XGBoost)Correlation Machine learning (ML)Sensitivity analysis
ABSTRACT
Rock discontinuity roughness affects the friction angle,dilatancy,shear strength, and fluid flow in discontinuities of rock mass, and thus influences the seepage, safety, and stability of surface and subsurface structures such as rock slopes and underground spaces.Accurate quantification and determination of discontinuity roughness are essential for evaluating the influence of surface roughness on the hydromechanical behavior of rock discontinuities.To quantify surface roughness,American Society of Mechanical Engineers(1955)suggested center line average (CLA) and root mean square (RMS or Z1) roughness indices. Later, Myers (1962) proposed three new statistical parameters (Z2, Z3, and Z4), since different surfaces have the same RMS (or CLA) value but significantly different geometric characteristics. Barton (1973) introduced the joint roughness coefficient (JRC) to determine the joint roughness. Barton and Choubey(1977) estimated JRC values based on ten standard JRC profiles ranging from 0 to 20 by visual assessment.The International Society for Rock Mechanics (1978) recommended this methodology for determining the joint roughness, which has been used as the main criterion for evaluating joint roughness so far.However,this method is empirical and can be subjected to personal or experiential judgment. Therefore, a sufficiently reliable and accurate method of assessing the JRC is required.
Numerous statistical parameters and empirical equations(mathematical expressions) have been developed to estimate the JRC,including RMS,CLA,autocorrelation function(ACF),and mean square value(MSV)of the roughness index(Tse and Cruden,1979);first derivative RMS, Z2(Tse and Cruden, 1979; Yu and Vayssade,1991; Yang et al., 2001; Tatone and Grasselli, 2010; Jang et al.,2014; Zheng and Qi, 2016); structure function (SF) (Sayles and Thomas, 1977; Tse and Cruden, 1979; Maerz et al., 1990; Yu and Vayssade,1991; Yang et al., 2001; Jang et al., 2014; Zheng and Qi,2016); standard deviation of inclination angle, SDi(Yu and Vayssade, 1991); roughness profile index, Rp(El-Soudani, 1978;Maerz et al., 1990; Yu and Vayssade, 1991; Tatone and Grasselli,2010; Jang et al., 2014; Zheng and Qi, 2016); maximum inclination,θ*max(Tatone and Grasselli,2010;Jang et al.,2014);modified Z2(Zhang et al., 2014); mean positive angle parameter, θP+(Belem et al., 2000); mean tangent angle,β100%(Zheng and Qi, 2016); and weighted positive angle (WPA) (Huan et al.,2021).
In addition to the above-mentioned experimental and statistical techniques, fractal theory (Li and Huang, 2015), image recognition approach (Ficker and Martiˇsek, 2016), support vector regression(Wang et al., 2017), vector similarity measure (Yong et al., 2018),and random field theory(Gravanis and Pantelidis, 2019) were also used to address the problem. However, most of the studies are based on statistical and fractal approaches (Li and Zhang, 2015;Abolfazli and Fahimifar, 2020; Marsch and Fernandez-Steeger,2021). Since natural joint profiles are self-affine rather than selfsimilar, contradictory findings for the same rock joint profile may be calculated by different users using fractal techniques(Kulatilake et al.,2006).Fractal approaches may also not provide safer or more reliable roughness estimation compared with simple statistical methods(Marsch and Fernandez-Steeger,2021).Although a variety of statistical relations have been proposed to estimate JRC objectivity, sometimes neither shows comparable results, and even for the same statistical parameters used, different JRC values are estimated (Wang et al., 2017; Abolfazli and Fahimifar, 2020; Marsch and Fernandez-Steeger, 2021). Moreover, because of the onesided interpretation of the joint surface roughness, such relationships may be questionable(Grasselli,2001).In addition,scale effect and sampling interval dependency are shortcomings of the proposed statistical methods.
Existing studies focus on the single statistical-based correlation of JRC without elaborately considering the information of other statistical parameters simultaneously, which may result in unreliable results,as mentioned above.Therefore,it is reasonable to use computational intelligence (CI) and machine learning (ML)methods for the nonlinear modeling of the JRC. CI and ML techniques can capture unknown nonlinear and complex relations between dependent and independent variables (Fathipour-Azar and Torabi, 2014; Fathipour-Azar et al., 2017, 2020; Golsanami et al.,2021). Wang et al. (2017) predicted the JRC based on support vector regression and factor analysis. In order to consider the inclination angle, amplitude, and distribution characteristics of inclination and amplitude, and other related roughness characteristics of rock joints, they selected eight statistical parameters(average relative height Rave, maximum relative height Rmax, standard deviation of height SDh, average inclination angle iave, SDi,Z2,Rp, and SF) in their studies, where Raverepresents the ratio of the average height of the surface profile to its nominal length, Rmaxrefers to the ratio of the maximum height of the surface profile to its nominal length,SDhrepresents the distribution characteristic of asperity height along the surface profile, iaveis the average inclination angle characteristic of the surface profile, SDiindicates the distribution characteristic of asperity inclination angle along the surface profile, Z2is the small inclination characteristic of the surface profile,Rprefers to the ratio of the true length of a surface trace to its nominal length, and SF is the variation in the surface profile texture. Using the statistical parameters Raveand Rmax, the scale effect on the JRC could be considered (Barton,1982; Zhang et al.,2014; Wang et al., 2017). Wang et al. (2017) reported the stability and reliability of JRC perdition with derived support vector regression compared to regression-based methods.
In this paper, a data-driven framework based on Gaussian process(GP),K-star,random forest(RF),and extreme gradient boosting(XGBoost) models is proposed to improve the estimation accuracy of JRC values. In the proposed framework, a database with eight statistical parameter inputs(Rave,Rmax,SDh,iave,SDi,Z2,Rp,and SF)and one output (JRC) is used to train and evaluate developed models.Statistical evaluation indices are employed to evaluate the models’performance.Finally,results obtained by the developed ML models are compared with those of classic regression correlation(i.e. Z2, Rp, and SDi).
As mentioned above, this study aims to present data-driven methods for estimating JRC values. For this purpose, 112 rock joint profile datasets from published literature(Li and Zhang,2015;Wang et al.,2017)are utilized.Using the JRC-JCS model(Barton and Choubey,1977),quantitative estimation of JRC values of these rock joint profiles is back-calculated from direct shear tests.
The database used in the modeling processes of ML methods comprises eight commonly used statistical parameters to characterize surface roughness(Lee et al.,1997;Li and Zhang,2015;Wang et al., 2017; Abolfazli and Fahimifar, 2020), i.e. Rave, Rmax, SDh, iave,SDi, Z2, Rp, and SF. As pointed out by Wang et al. (2017), a detailed explanation of the joint roughness could be accomplished using the chosen eight common statistical parameters instead of one single statistical parameter. Fig. 1 shows the scatterplot matrix of the variables with the variable distributions in diagonal, correlation coefficient in the upper part, and correlation ellipses and loess smoothing regression lines with a confidence interval of 0.05 in the lower part. Long narrow ellipses represent large correlations and circular ellipses indicate small correlations. Datasets are divided into training and testing sets, of which 80% (89 samples) are determined for training and the remaining 20% (23 samples) for testing the developed models. The statistical parameters of the training and testing datasets are presented in Table 1.
Four predictive methods are used to construct models for estimating the JRC values based on the joint profile datasets. The efficiency or effectiveness of the developed ML models is evaluated using three common indices, i.e. coefficient of determination (R2),root mean square error(RMSE),and mean absolute error(MAE).In this section, the ML models used are briefly introduced.
A GP is a stochastic process (a collection of random variables)(Rasmussen and Williams, 2006). In the GP, any finite subset of random variables has a joint Gaussian distribution. A GP,f(x)(x ∊X), can be specified by a mean function m(x) and a kernel function (or covariance function) k(x,xˊ)(xˊ∊X), which are expressed as
where y is the noisy target value, and ε is the noise term and has zero mean and variance σ2n. Consequently, the joint Gaussian prior distribution of the training target value y at training data X and the predictive target value f*at testing instances X*can be expressed as follows:
Fig.1. Scatterplot matrix of the variables with the histogram in diagonal, correlation coefficient, and smoothed regression lines.
where K(·) is the kernel matrix, and I is the identity matrix.Generally, zero mean matrix is preprocessed for the dataset.
The predictive distribution is presented as follows:
It is necessary to choose suitable covariance or kernel function since it directly affects the predictive efficiency of the model(Fathipour-Azar,2021).In the study,two different commonly used kernel functions are selected for GP model development.
Gaussian or radial basis kernel (RBF):
Pearson VII kernel function (PUK):
where xiis the original data;and μ and σ are the mean and standard deviation of the data, respectively.
K-star is an instance- and rule-based modeling technique(Cleary and Trigg,1995).Entropic distance measurement is used by this instance-based classifier.K-star learning algorithm assesses the resemblance between two instances in order to set them in one class. Afterward, for new data, the created classification feature is used. Therefore, new data (a) are allocated to the most prevalent class among the K nearest instances,bi.The basic framework for Kstar is expressed as follows:
where P*indicates the probability of all transformational means from instance a to b.
Table 1 Statistics analysis of the datasets.
RF is an ensemble decision tree learning algorithm. The RF algorithm consists of regression trees trained using various bootstrap samples (bagging) of the training dataset. Each tree performs regression on its own, and the final decision is regarded as the average of the single tree’s predictions. In the case of bagging, the training dataset contains about 2/3 of the data from the original dataset.The rest 1/3 out of the bootstrap sampling data(out-of-bag(OOB) data) from each developed tree are utilized to assess estimation error and variable significance. Two primary parameters,i.e. number of trees and randomly chosen variables at each split node, are required to construct the RF regression model (Breiman,2001; Zhang et al., 2021a).
XGBoost model is an advanced tree boosting scheme(Chen and Guestrin,2016).It is an enhancement to Friedman(2001)’s gradient boosting approach.XGBoost works by creating a base model for the pre-existing model, which consists of training an initial tree,creating a second tree combined with the initial tree,and repeating the second step until the stopping criterion(i.e.a desired number of trees)is attained(Zhang et al.,2020).When t trees are created,the newly generated tree is utilized to fit the residual of the last prediction. The sum of each tree’s predictions yields the model’s ultimate prediction. The general function for approximating the system response at step t is given as follows:
In order to optimize the ensemble tree and prevent overfitting issue, the objective function of XGBoost can be minimized as follows:
where l is a convex function (i.e. loss function) that is used to find the difference between exact and computed values; yiis the measured value;n is the number of observations used;and Ω is the penalty factor(regularization term)and defined as
Table 2 Models’ performance indices based on the training and testing datasets.
where T and w are the number of leaf nodes and their scores,respectively; λ is the regularization parameter; and γ is the minimum loss needed to further partition the leaf node.
XGBoost method uses the objective function as the evaluation function to explore all of the feature points using a greedy algorithm.The split objective function value is compared with the gain of a single leaf node’s objective function within a predetermined threshold that subsequently limits tree growth, and the split is executed only when the gain exceeds the threshold.As a result,the best features and splitting points can be identified to establish the tree structure (Zhang et al., 2021b).
As stated above, initial parameters of the GP, K-star, RF, and XGBoost models need to be optimized to ensure the best regression results. Grid search and iterative trial-and-error procedures and checking the values of the performance evaluation metrics(i.e. R2,RMSE, and MAE) are used for tuning hyperparameters in ML practice. For the GP model, noise and kernel hyperparameters should be optimized. For the GP-RBF model, it is determined that ε=0.024 and γ=0.07;and for the GP-PUK model,ε=0.01,δ=15.4,and ω=0.4.For the K-star model,the value of blending parameter b = 23 is considered. In order to implement the RF model, two parameters need to be tuned. The best estimate is found for 100 trees and the optimal split is obtained over a randomly selected subset of four predictor variables of all eight predictors at each node.
Fig. 2. Scatterplots for different modeling techniques: (a) GP-RBF, (b) GP-PUK, (c) K-star, (d) RF, and (e) XGBoost models.
For the XGBoost model, tree (XGBtree) is considered as a base learner and booster. Accordingly, for this model, the following optimal parameters are found:boosting iterations=80,maximum depth of tree=3,minimum loss reduction=1,subsample ratio of columns = 0.5, minimum sum of instance weight (node size) = 2,subsample percentage = 0.25, and booster learning rate(shrinkage) = 0.1. For the different methods employed, the R2,RMSE, and MAE statistics are calculated based on the training and testing datasets. The results are provided in Table 2.
As presented in Table 2,the developed models provide generally comparable performance in all evaluation metrics. It can be seen that the GP model with PUK kernel function shows good predictive capabilities and slightly outperforms GP regression using RBF kernel function in terms of lower RMSE and MAE values and higher R2in both the training and testing datasets. Although K-star demonstrates the best performance in the training phase among models(R2=1, RMSE = 0.039,and MAE= 0.22), it slightly underperforms the GP-PUK, RF, and XGBoost models in the testing phase(R2= 0.962, RMSE = 1.059, and MAE = 0.728). However, XGBoost model outperforms other models in the testing phase with R2= 0.975, RMSE = 0.773, and MAE = 0.618.
Based on these results, the statistical indices for the RF model are R2= 0.985, RMSE = 0.545, and MAE = 0.433 in the training phase and R2=0.968,RMSE=0.876,and MAE=0.722 in the testing phase. This indicates that the performance of the model based on the RF method is generally better than the GP model based on RBF and PUK kernel functions for the provided data.
Fig.2 demonstrates that the performance of the K-star model is better than other proposed models (i.e. GP-RBF, GP-PUK, RF, and XGBoost) for the training dataset since the performance of this model is closer to the E = A (E and A are the estimated and actual JRC values, respectively) regression line compared with the other developed models. However, for the testing dataset, the XGBoost,RF, and GP-PUK models perform slightly better than the K-star model. Based on these results, the developed models show promising performances in estimating JRC values.
The developed models based on the considered ML algorithms in the present research are finally compared with empirical equations (Eqs. (16)-(18)) proposed by Li and Zhang (2015) using 23 testing dataset. A comparison of JRC values estimated from the developed models and calculated by Eqs. (16)-(18) are shown in Figs.3 and 4.These three surface roughness indices based on Z2,Rp,and SDiare respectively calculated using the following equations:
Figs. 3 and 4 graphically illustrate that evolved ML models outperform single statistical parameter-based regression equations. According to Figs. 3 and 4, Eqs. (16)-(18) overestimate JRC values when they are more than about 15.In Fig.3a,comparison is made with reference to the actual JRC values. The corresponding calculated residuals are also plotted in Fig. 3b. Although there are minor differences between the estimated and actual JRC values,all ML algorithms provide higher accuracy and yield better results that are in very good agreement with the actual JRC values when compared with conventional regression models. It is also evident that at some rock joint profiles, there is a very high divergence between the actual JRC data and the predicted ones using three regression models,while the developed models of this study show the lowest estimation error. As shown in Fig. 4, by analyzing the correlation and error between the estimated and actual JRC values of the testing dataset, it can be seen that the estimated JRC values are relatively concentrated on the E = A line. Violin plot, a combination of a box plot and a kernel density plot, of residuals is presented in Fig. 4b. It can be seen that the testing residuals are also basically distributed near the zero line for the developed ML models compared with Z2, Rp, and SDibased equations.
Fig. 5 presents evaluation indices of the models and equations.The comprehensive results show that the conventional equations are not the most accurate prediction models during training and testing phases compared with the developed ML models.Higher R2(closer to 1)and lower RMSE and MAE(closer to 0)for the training and testing datasets indicate the accuracy of the developed ML models compared with equation performances.
Taylor diagram (Taylor, 2001) is used to evaluate the predictabilities of the developed ML models.Taylor diagram,as seen in Fig.6, is an efficient visual means of summarizing how closely the evolved models match the measured data in terms of three frequently used metrics including correlation coefficient, standard deviation, and RMSE. As can be observed from Fig. 6, K-star in the training phase and XGBoost in the testing phase are the closest to the measured points of training and testing phases, respectively.
Fig. 3. (a) Comparison of the JRC values estimated by developed models and empirical regressions along with the actual values; and (b) Corresponding residuals.
Fig. 4. (a) Comparison of JRC estimations using developed models and empirical regressions; and (b) Corresponding violin plot of residuals.
Fig. 5. Multi-axis graph of machine learning models and equation evaluated indices in the (a) training and (b) testing phases.
Fig. 6. Taylor diagram to compare the models with respect to the measured data: (a) Training and (b) testing phases.
The feature importance score is a technique used to measure feature significance and model interpretability. Fig. 7 depicts the relative importance of the features for the XGBoost model demonstrating the effect of features on JRC. A trained XGBoost model can compute feature importance automatically using the interface feature importance criteria,i.e.gain,cover,and frequency criteria. The sum of the importance of each feature equals 1. Gain indicates each feature’s contribution of each tree to the model performance improvement. Cover refers to the relative number of observations related to a feature. Frequency is the percentage defining the relative number of times that a feature decides on a split in the trees. Higher values of these indices, compared with another,indicate that such features are more important for making estimation. Fig. 7 presents the average of feature relative importance for the eight features using 10-fold cross-validation. The figure shows that SDh, SDi, Rave, and iaveaffect the JRC more than Rmax, Rp, Z2, and SF.
Fig. 7. Feature importance analysis by the XGBoost model.
Fig. 8 presents partial dependence plots (PDPs) (Friedman,2001) for the eight predictor features. PDPs show how the variation of variables over their marginal distributions affects the average value predicted (Goldstein et al., 2015). Each plot in Fig. 8 represents the influence of each variable while the other variables are held constant.Wider variation ranges for the average estimated JRC value could be observed for SDh(7.724-11.842), SDi(8.117-11.439), iave(8.411-11.035), and Rave(8.337-10.478)compared with other features(8.734-9.865 for Rp,8.793-9.544 for Rmax, 8.814-9.277 for SF, and 9.035-9.288 for Z2). Although associations with JRC are observed for SDh, SDi, Rave, iave, and Rmaxfeatures, the strength of the association varies for all features. In general, the JRC increases with all features except for Z2. Although the averaged JRC value fluctuates between 9.035 and 9.288 when Z2is changed in the range of 0.203-0.319, the difference is not generally significant.A sharp increase in JRC occurs,particularly for Rpand SF at 1.035 and 0.004,respectively.However,it can be seen that specific ranges of each feature affect the estimated JRC value.Out of these specified ranges, these features are less important variables, and variation of the average estimated JRC is insignificant.
To assess how much variance in the JRC value is due to feature interaction, the H-statistic (Friedman and Popescu, 2008) is used.Fig.9 depicts the interaction strength for each of the features with any other features for predicting the JRC. Overall, each variable explains less than 4% of the variation; therefore, the interaction effects between the features are very weak.
Fig.9. The interaction strength for each feature with all other features for the XGBoost model predicting JRC.
Finally, local interpretable model-agnostic explanations (LIME)(Ribeiro et al.,2016)is used to explore and explain the importance of each feature for each JRC estimation in the testing dataset by approximating it locally with an interpretable model. The result is presented as a heatmap in Fig.10 which lists the number of rock joint profile on the x-axis and the classified features on the y-axis.Feature weights are represented by colors. Each cell’s color indicates the features’ relevancy in deciding the corresponding JRC value. A feature with a positive (blue) weight supports the JRC,
whereas a feature with a negative(red)weight contradicts the JRC.In addition, the LIME algorithm considers optimally 4 knobs for each feature that the model fits the local region.Therefore,similar to PDPs in Fig.8,the influential range of each of eight features could be understood. In general, it can be seen that Rmax, Rp, Z2and SF features have lower relevance to the JRC,while SDh,SDi,Raveand iaveare highly relevant to the estimated JRC value. This result is consistent with the feature importance obtained by the gain,cover,and frequency approaches in the training phase.This plot is helpful to comprehend how the ML methods, here the XGBoost model,make JRC estimation.
Fig. 8. Partial dependence plots for the features of the XGBoost model.
Fig.10. Feature importance as a heatmap visualization of all case-feature combinations for testing dataset in the XGBoost model.
The proposed data-driven models estimate the JRC value based on the eight statistical characteristics of rock joint surface. Such a comprehensive approach can be useful and reliable in the JRC estimation problems since classic empirical regression approaches are mostly based on a single statistical parameter and may result in different values (Wang et al., 2017; Abolfazli and Fahimifar, 2020;Marsch and Fernandez-Steeger, 2021). ML algorithms are efficient and suitable for identifying complex nonlinear relationships(regression problems)and,therefore,for developing JRC estimation models. Moreover, based on the nature of the algorithms, the uncertainty of statistical parameters and their influence on the target of interest to be estimated could be considered.
Due to the probabilistic nature of the GP,the space of functions relating inputs to outputs can be defined by specifying the mean and covariance functions of the process. Regarding decision tree algorithms,bagging and boosting ensemble algorithms are used to train learners in parallel and sequential manners in the RF and XGBoost models, respectively. K-star is a lazy learning algorithm that, as an instance-based learner, uses K nearest neighbor with entropy as a measure of distance.The proposed models have been evaluated in JRC estimation, providing insights for future developments that could take into consideration different statistical and fractal features on the desired target estimation.
Rock joint behavior in engineering practice is influenced by the JRC, but current single statistical parameter-based empirical models for JRC estimation might not be sufficient to illustrate the joint roughness comprehensively. An adequate characterization of the joint roughness is essential for a good result of the JRC estimation model.ML-based methods are proposed and compared to investigate their performances of mapping the eight input statistical roughness parameters with the output parameter, i.e. the JRC. Based on the 112 rock joint profile datasets from published literature, JRC estimation models were established using the GP with two kernel functions,K-star,RF,and XGBoost algorithms.The developed ML-based models provide comparable performance in terms of R2,RMSE,and MAE indices.Based on the trained XGBoost model, feature importance rank is provided using gain, cover, and frequency indices. PDPs are used to demonstrate the effect of features on the average predicted JRC value in the training phase.The H-statistic is utilized to evaluate how much variation of the JRC value is explained by the feature interaction. Moreover, the LIME algorithm is employed to indicate the effect of the variables on the JRC using the testing dataset.The effectiveness of GP,K-star,RF,and XGBoost models are further demonstrated by comparisons with three well-known empirical formulations using 23 testing datasets.ML algorithms outperform the empirical equations used in this research, showing better predictive performance. In general, ML methods used for modeling demonstrate their effectiveness and versatility and are suitable for roughness quantification of rock discontinuity.
Declaration of competing interest
The author declares that he has no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Journal of Rock Mechanics and Geotechnical Engineering2021年6期