Quang-Hieu Tran,Hoang Nguyen and Xuan-Nam Bui
1Department of Surface Mining,Mining Faculty,Hanoi University of Mining and Geology,Hanoi,100000,Vietnam
2Innovations for Sustainable and Responsible Mining(ISRM)Research Group,Hanoi University of Mining and Geology,Hanoi,100000,Vietnam
ABSTRACT This study considered and predicted blast-induced ground vibration(PPV)in open-pit mines using bagging and sibling techniques under the rigorous combination of machine learning algorithms.Accordingly,four machine learning algorithms,including support vector regression(SVR),extra trees(ExTree),K-nearest neighbors(KNN),and decision tree regression(DTR),were used as the base models for the purposes of combination and PPV initial prediction.The bagging regressor(BA)was then applied to combine these base models with the efforts of variance reduction,overfitting elimination,and generating more robust predictive models,abbreviated as BA-ExTree,BAKNN,BA-SVR,and BA-DTR.It is emphasized that the ExTree model has not been considered for predicting blastinduced ground vibration before,and the bagging of ExTree is an innovation aiming to improve the accuracy of the inherently ExTree model,as well.In addition,two empirical models (i.e.,USBM and Ambraseys) were also treated and compared with the bagging models to gain a comprehensive assessment.With this aim,we collected 300 blasting events with different parameters at the Sin Quyen copper mine (Vietnam),and the produced PPV values were also measured.They were then compiled as the dataset to develop the PPV predictive models.The results revealed that the bagging models provided better performance than the empirical models,except for the BA-DTR model.Of those,the BA-ExTree is the best model with the highest accuracy(i.e.,88.8%).Whereas,the empirical models only provided the accuracy from 73.6%–76%.The details of comparisons and assessments were also presented in this study.
KEYWORDS Mine blasting;blast-induced ground vibration;environmentally friendly blasting;peak particle velocity;bagging;extra trees
Rock fragmentation is essential before shoveling,loading,and transporting activities in open-pit mines.It can be conducted by drilling-blasting or mechanized methods.Of those,blasting has been recommended as the best method for fragmenting rock with high quality in fragmentation and low cost[1].However,the environmental impacts induced by this method are significant,such as ground vibration(GV),flyrock,air over-pressure,and air pollution[2–4].Of the mentioned impacts,GV is recommended as the most dangerous problem that can threaten the structural integrity of surrounding structures,benches,and slope stability[5,6].
To evaluate the effects of blast-induced GV on the surroundings,seismographs have been proposed to measure the GV intensity,and peak particle velocity(PPV)is used as the main indicator.Although the field measurement with the seismograph can provide measurement results with high accuracy;nevertheless,it is only determined after blasting operations have been implemented.In other words,the damages are inevitable.In addition,the field measurement method is often time-consuming,costly,and requires several experiences in calibration and use of the seismograph.As a matter of fact,many countries have issued standards for measuring PPV (e.g.,the U.S.Bureau of Mines standard,OSM Standard,the 01/2019-BCT standard of Vietnam,Swiss standards SN 640 312a,to name a few),and many blasts produced the PPV within the allowable limits of the standards;however,the surrounding structures were still affected and damaged.For reducing PPV,the selection of proper blast pattern is considered an effective method[7].The blasting management techniques have also been proposed to mitigate unwanted effects,including ground vibration[8,9].Despite a proper blast pattern that can reduce PPV[10];however,it is challenging to know how a reduction of PPV is safe for the surroundings if it has not been quantified?Therefore,the prediction of PPV in mine blasting is necessary to control the damage induced by blasting operations.
For this aim,many empirical equations have been proposed based on the historical PPV measurements[11].However,they have been adjudged as low accuracy and cannot reflect the relationship between PPV and blasting parameters,as well as rock properties[12–16].
To gain better results,artificial intelligence(AI)methods/models have been proposed to predict PPV with many advantages,such as high accuracy,rock properties are considered,different blasting parameters are investigated and applied,low-cost,and time-saving.A variety of AI models have been proposed for the aims of PPV prediction and control in open-pit mines,such as artificial neural networks(ANN)models[17–20],machine learning-based models(e.g.,support vector machine,CART,multivariate statistical analysis,multivariate adaptive regression splines,to name a few)[21–25],metaheuristic algorithm-based ANN models[1,5,26–30],metaheuristic algorithm-based machine learning models [31–37],and clustering-based models [38–41].Therein,the accuracies of the introduced models are in the range of 92.7%–98.6%.These techniques and models have been recommended as superior for predicting PPV with high accuracy and reliability in open-pit mines.In addition to these techniques,ensemble models based on the bagging or/and boosting strategies were also suggested as a potential approach for predicting environmental issues in mine blasting with improved accuracy.Nonetheless,they just have been applied for predicting air over-pressure and flyrock in mine blasting[42–44].In this study,the bagging technique (BA) with four novel ensemble models based on this technique and various machine learning algorithms,including extra trees (ExTree),Support Vector Regression(SVR),K-nearest neighbors(KNN),and decision tree regression(DTR),were applied and combined to predict PPV in open-pit mines,abbreviated as BA-ExTree,BA-SVR,BA-KNN,and BADTR.It is worth mentioning that among these bagging models,the BA-SVR model has been proposed for predicting PPV [45].Also,the standalone SVR,KNN,and DTR models have been applied to predict PPV[46–48].Nonetheless,the ExTree and other bagging models,particularly the BA-ExTree model,have not been applied for predicting PPV and other environmental issues in mine blasting(e.g.,air over-pressure,flyrock.Therefore,this study aims to discover how the ExTree model predicts PPV and its performance upon bagged for predicting PPV.Besides,USBM and Ambraseys,which are the most common empirical models for estimating PPV,were also applied to compare with the proposed bagging models to assess their accuracy and reliability comprehensively.
Extra trees (ExTree) is an ensemble machine learning model based on the supervised learning technique [49].It is also referred to as the extremely randomized trees model for regression and classification problems[50].The ExTree was developed as an extended version of the random forest(RF)model to overcome overfitting.
Similar to the RF algorithm,ExTree uses random subsets to train the base models,and finally,the predictions are combined as a data frame for the outcome predictions[51].However,the ExTree selects the best feature by randomly splitting the node.Regarding the structure,ExTree consists of a number of decision trees,and each tree includes a root node,split nodes,and leaf nodes (Fig.1).Given a datasetX,the ExTree splits the dataset into random subsets of features at the root node.Each subset is considered as a split/child node,and splitting evolution is continuously conducted until a leaf node is reached.In each tree,the outcome predictions are calculated.Finally,they are combined from different trees,and the average of predictions is taken into account as the official result of the model for regression problems.
Figure 1:ExTree structure for regression problems
In the ExTree,three critical parameters can be considered while developing the model:the number of trees,random features selected,and minimum samples for splitting.One of the robust advantages of the ExTree is the reduction of variance and bias on the training dataset under the cut-point selection and attributes’explicit subset randomization [52].This study considered the ExTree algorithm to predict PPV in mine blasting based on these advantages.
SVR is one of the two branches of the support vector machine (SVM) algorithm that was developed by Drucker et al.[53]based on the first version of the classification problem(SVC)proposed by Cortes et al.[54].The mathematical model of the SVR can be described as follows:
Given a dataset with two dimensionswithxi∈Rnandyi∈[-1,1],a hard margin is used to separate the dataset:
where the hyperplane is denoted by(ω,b),as shown in Fig.2a.Herein,the vectors(data points)close to the hyperplane are called support vectors(Fig.2a).In the case of non-separable data,SVR can map the data into higher dimensional space through kernel functions(Fig.2b).Based on the support vectors,SVR can calculate the Euclidean distance of other vectors with the following regression function:
whereωis the weight matrix;φ(x)denotes the mapped dimensional space using kernel functions;bis the bias.
Figure 2:Structure and principle of the SVR model(a)Support vectors with the structure of SVR;(b)The mechanism of the mapping dataset with kernel functions;(c)SVR flowchart with kernel functions and slack variables
The loss function of the SVR is then applied to minimize the error of the model using Eq.(3):
wherePstands for the penalty parameter which is used to control the trade-off between the margin and slack variable;fεdenotes the“ε-sensitive”loss function,which is calculated as follows:
Finally,the objective function for SVR based on support vectors and hyperplane is expressed as the Eq.(5).Fig.2c shows the mechanism of the SVR model with the slack variables and mapping procedure of the data.
KNN is known as a lazy algorithm in machine learning.The principle of the KNN is described based on finding the number of nearest neighborskand computing the weights of new samples through the average value of theknearest neighbors [55].It can be used for multi-model targets with no training.In KNN,the similarities between testing samples and predefined ones are calculated and the largest values are selected.Finally,the testing samples are considered and compared with the similarities and the selected values.The similarity is calculated based on Euclidean distance,as follows:
whereaiandbidenote the single eigenvalue in feature vectorand;nstands for the feature vector length.
Once the Euclidean distance is calculated,a weight is assigned to each neighbor.Subsequently,the calculated distances are sorted in ascending and theknearest neighbors are selected.The mathematical model of this task is expressed as follows:
whereDiis the distance between the reference samples andithsample;kis the number of samples;pdenotes the power parameter;wistands for the assigned weight of each neighbor.
The outcome prediction is then computed using Eq.(9)based on the weighted sum.
DTR is a nonparametric machine learning method that was proposed based on a decision tree model with leaves and nodes[56].DTR can learn any mapping function form from the training dataset[57].In this way,it is easy to explore the complex relationships of the dataset based on IF-THEN rules[58].
DTR creates a single regression tree for a regression problem by splitting the dataset into groups,and the homogeneity is maintained as much as possible in the output [59].For this aim,a specific independent variable will be splitter under a set of decision rules (IF-THEN rules).To measure the homogeneity of the output,the residual sum of squares,also referred to as the impurity of a node in the tree,is calculated.Accordingly,an independent variable is selected to split the node with the maximum homogeneity in the child nodes.The remaining independent variables are used as the child nodes.The tree is then pruned to avoid overfitting.The cross-validation process can support this to reduce the prediction error as much as possible.In short,DTR is a top-down decision tree and the splitting process is employed continuously until the stopping criterion is met.It is a flexible and powerful algorithm for regression problems without any requirements for calculations in graphical forms.The top-down development of the DTR is illustrated in Fig.3.
Figure 3:An illustration of DTR in machine learning
Bagging is one of the ensemble learning techniques aiming to combine multiple base models to get better predictions based on the mechanism of variance reduction within a noisy dataset.This technique allows the models to avoid or minimize the overfitting problem of data due to the bootstrap aggregating method being applied.Also,it is a potential method to deal with datasets with high dimensions.To perform bagging model,five following steps are employed:
· Step 1:Given a training dataset withmfeatures andnobservations.A random sample from the training dataset is selected without replacement.
· Step 2:Creating a model based on a random subset and sample observations.
· Step 3:Splitting the nodes using the best split out of the lot.
· Step 4:Getting the best root nodes with the grown tree.
· Step 5:Repeat Steps 1–4 withntimes.Then the results of individual decision trees are aggregated to give the best prediction.
Fig.4 expresses the flowchart of the bagging models proposed to predict PPV in this study.
Figure 4:Flowchart of the bagging models for predicting PPV
For the PPV prediction,300 blasting events were measured using the Micromate seismograph(Fig.5)and the relevant blasting parameters were investigated,including diameter of borehole(Dbh),bench height (H),length of borehole (Lbh),hardness of rock mass (f),powder factor (PF),burden(B),spacing(S),stemming(ST),and maximum explosive charge per delay(MEC).Before measuring PPV,the distance between blasting sites and seismograph was determined by GPS receivers (MD).The collected dataset is visualized in Fig.6 as the underlying data,and the statistical overview of the dataset is summarized in Table 1.
Table 1:Statistical overview of the dataset
Figure 5:PPV measurement using micromate seismograph
Figure 6:Scatterplot matrix of the PPV dataset collected in this study
Prior to developing the bagging models for predicting PPV,the dataset was scaled using the standard scaler method interval[-1,1]to reduce the bias of the models due to the different ranges of the input variables.Subsequently,the dataset was divided randomly into two parts:70%was used to train the bagging models,and 30%was used to test the developed bagging models in predicting PPV.
To evaluate a given model,10-folds cross-validation method was applied,aiming to avoid overfitting and exactly test every data points,as well as reduce the variance of the resulting estimate[41,60,61].In this way,all the subsets will be trained and validated to ensure the ranges of the dataset fit with the model.Ofcourse,we can use 5-folds instead of 10-folds as used for this paper.However,the use of 10-folds provides more details of the subsets during the training of the model,and that’s the reason why it is the common way to be used by researchers [62–64].Furthermore,the negative root means squared error was used to measure the accuracy of the bagging models in predicting PPV herein.The use of this metric aims to handle both scores and losses in the same way based on the scenario of the loss function.To develop the bagging models,the flowchart in Fig.4 was applied based different base models,such as ExTree,SVR,KNN,and DTR.
4.2.1 Bagging Models
For developing the BA-ExTree model,the number of estimators was considered the main parameter used to tune the accuracy of the ExTree model in predicting PPV.Based on the motivation of combining multiple ExTree models(as base models),20 different ExTree models were created with a number of estimators in the range of [100,119].The performance of the base ExTree models is shown in Fig.7a.Finally,20 base ExTree models were bagged to generate the BA-ExTree model for predicting PPV.
A similar approach was applied for developing the BA-SVR model,but the controlling parameter is different.Herein,the radial basis function was used as a kernel function to transfer the dataset in the SVR model,and three parameters were used,including cost(C),gamma(γ),and sigma(σ).In this study,two parameters includingγandσwere fixed withγ=0.1 andσ=0.00816.These parameters were selected based on the optimal procedure of the SVR model.Finally,theCparameter was tunned in the range of 1 to 19 to examine the SVR models’accuracy,as shown in Fig.7b.These base SVR models were then bagged to generate the BA-SVR model for predicting PPV.
For developing the BA-KNN model,19 different base KNN models were also developed based on the tunning of thekparameter(i.e.,the number of nearest neighbors).Herein,kwas tunned with the range of 1 to 19.The performance of the base KNN models is shown in Fig.7c.Eventually,the BA-KNN model was established based on the combination of 19 developed KNN models for predicting PPV.
Similar to the previous bagging models,the BA-DTR model was developed for predicting PPV based on the combination of different base DTR models.However,many parameters were used as parts of the model in the DTR model.To split an internal node,a minimum number of samples is required(i.e.,samplemin)for a small dataset.It was then used and tuned to check the accuracy of the DTR models.In addition,the minimum weight at a leaf node (i.e.,leafw_min) was fixed as 0.5 during tunning thesampleminparameter in the range of 2 to 19.The performance of the base DTR models for predicting PPV is shown in Fig.7d.Ultimately,18 DTR models were combined to create the BA-DTR model for predicting PPV.
Figure 7:Performance of different base models based on the individual ensemble models (a) BAExTree;(b)BA-SVR;(c)BA-KNN;(d)BA-DTR
4.2.2 Empirical Models
In order to compare with the bagging models(i.e.,BA-ExTree,BA-SVR,BA-KNN,BA-DTR),two empirical models based on the equations proposed by USBM and Ambraseys[65].The form of the empirical equations is as follows:
The USBM equation for estimating PPV:
The Ambraseys empirical equation for estimating PPV:
wherekandbare experimental coefficients which can be determined based on the multivariate regression analysis method.The ratio betweenMDandMECis called the scaled distance.
It is worth noting that although the empirical models use only two variables(e.g.,MDandMEC)to estimate PPV;however,the same datasets with the same observations were used as those used for the bagging models.Finally,the empirical models were defined for predicting PPV in this study,as expressed in Eqs.(12) and (13).The correlation between the scaled distance and estimated PPVs by the empirical models is shown in Fig.8.
Figure 8:Correlation between the scaled distance vs.PPV by the empirical models
The USBM model:
The Ambraseys[65]model:
Once the machine learning-based bagging models and empirical models were well-developed,they were used to predict PPV on the testing dataset,and their performance was carefully evaluated.Looking at Fig.7,we can see that the base ExTree models provided better predictions without outliers than the other models.In contrast,three remaining models (SVR,KNN and DTR) provided PPV predictions with outliers.Moreover,taking a closer look at the performance metric in Fig.7(negative RMSE),we can see that the errors of the base ExTree models are lower than the remaining models.These findings can reveal positive results for the BA-ExTree model when we combine these base models for predicting PPV.
Prior to evaluating the bagging and empirical models,the testing dataset with the remaining 30%of the dataset was imported to the developed models.The results were then predicted and exported to compare and evaluate through four performance metrics,including R2,RMSE,MAE,and VAF.These metrics can be calculated using the following equations:
The errors and goodness-of-fit of the dataset for the bagging and empirical models are summarized in Table 2.
Table 2:Comparison of the machine learning-based bagging models and empirical models
Compare PPV predictions resulting on the two groups (e.g.,bagging models and empirical models),there is no double that the machine learning-based bagging models group provided better performances than those of the empirical models group except the BA-DTR model.This also indicated that the DTR or BA-DTR models have a tendency towards low performance for predicting PPV in this study,and its accuracy is even lower than the empirical models(i.e.,R2=0.607,RMSE=0.567,MAE=0.449,VAF=52.874 on the testing dataset).Among the three remaining bagging models(i.e.,BA-ExTree,BA-SVR,BA-KNN),the BA-ExTree model yielded the highest accuracy of 88.8%.Follows are the BA-SVR and BA-KNN models with 83.4%and 80.7%,respectively.
Regarding the empirical models,despite the fact that their performances are lower than the BAExTree,BA-SVR,BA-KNN models;however,the USBM model provided slightly better accuracy than the Ambraseys empirical model with the accuracies of 76%compared to 73.7%.
As one of the most common statistical indexes to evaluate how the model is sound and goodness of fit,the correlation between predictions obtained by the individual bagging and empirical models and measurements was considered,as shown in Fig.9.
As interpreted in Fig.9,it is visible that the convergence of the BA-ExTree,BA-SVR,BA-KNN models is much better than the BA-DTR model.Whereas,the convergence of the BA-DTR model tends to be flat,and this led to its low performance,as discussed earlier.Taking a closer look at Fig.9,we can see that most of the data points of the BA-ExTree model are neatly distributed within the 80%confidence interval of the model.Meanwhile,the BA-KNN and BA-SVR models provided more data points outside the 80% confidence interval.The empirical models got lower convergences with so many data points outside the 80% confidence interval.An absolute difference comparison was conducted to understand which model can provide the highest accuracy in practical engineering.This comparison can show how close measured PPV and predicted PPV is,as shown in Fig.10.Accordingly,the predicted PPVs by the BA-ExTree model tend to be close to the measured PPVs than predicted by the other models.Of the testing samples,most of the predicted PPVs by the BA-ExTree model can be considered as matches with the measured PPVs.In contrast,the PPV values predicted by the BA-DTR model are furthest from the measured PPV values,as shown in Fig.10.
Figure 9:Regression performance of different ensemble and empirical models
Figure 10:Absolute difference comparison of testing PPVs
Considering the complexity of the models,it is clear that the empirical models are simpler than the machine learning-based bagging models in model development.Nonetheless,the application/use of both machine learning-based bagging models and empirical models in practice is the same because the input variables are entered,and the models can predict PPV automatically.
Although the developed models are pretty good,especially the BA-ExTree model;however,it is still necessary to study how various sources of uncertainty in an AI-based model contribute to the overall uncertainty of the model.It plays an essential role in providing insight into the blasting dataset used,as well as the developed model and the bias for dimensional reduction that can improve the accuracy of the predictive model on the PPV prediction problem.Thus,a sensitivity analysis was performed in this study to determine how different values of the PPV variable affect a particular dependent variable(input variable)under a given set of assumptions.
Herein,the BA-ExTree model was selected as the best one for predicting PPV among the developed models in this study.This model can provide a feature importance property after being fit,and it can be accessed to retrieve the relative importance scores for each input variable,as shown in Fig.11.Please be noted that the input variables,Dbh,H,Lbh,f,PF,B,S,ST,MEC,and D are encoded as 0–9 feature labels in Fig.11.Due to the stochastic nature of the algorithm,this procedure was implemented 10 times to consider the average values of feature importance.Finally,as depicted in Fig.11,the model found three important features,including f,PF,and MEC.Following is the D variable,and the other variables provided low importance.These findings are met with the properties of rock mechanics and blasting mechanisms.Indeed,if the hardness of rock mass(f)is high,it requires a high volume of explosives to break a unit of rock mass,too,i.e.,m3(PF).These parameters will lead to high MEC and this parameter significantly affect the PPV induced by blasting in open-pit mines.In fact,D is an uncontrollable parameter,and we cannot change it while blasting.It is worth noting that the feature importance provided by the BA-ExTree model may vary from the other models due to the different theories and stochastic nature of the algorithms.
Figure 11:Feature importances of the ExTree model at predicting PPV
Blasting and its advantages/disadvantages in practical engineering are considerable concerns,primarily environmental issues.Of those,the efforts to reduce adverse effects on the surrounding environment due to blast-induced GV are the primary goal of researchers.For this aim,this study proposed a novel intelligent model (i.e.,BA-ExTree) based on the bagging technique for predicting PPV.The results were thoroughly considered and evaluated through the other bagging models (i.e.,BA-SVR,BA-KNN,and BA-DTR)and empirical models.They indicated that the BA-ExTree model provided high accuracy and reliability in predicting PPV in open-pit mines.
Author Contributions: Quang-Hieu Tran: Conceptualization;Investigation;Resources;Writing-Original Draft;Writing-Review &Editing.Hoang Nguyen: Methodology;Formal Analysis;Visualization;Writing-Review &Editing.Xuan-Nam Bui: Conceptualization;Investigation;Resources;Writing-Original Draft;Validation;Writing-Review&Editing;Supervision;Project Administration.
Funding Statement:This research is funded by Vietnam National Foundation for Science and Technology Development(NAFOSTED)under Grant No.105.99–2019.309.
Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.
Computer Modeling In Engineering&Sciences2023年3期