Multiclass stand-alone and ensemble machine learning algorithms utilised to classify soils based on their physico-chemical characteristics

2022-04-08 08:55EyoEyoSmuelAbbey

Journal of Rock Mechanics and Geotechnical Engineering 2022年2期

Eyo Eyo, Smuel Abbey

a Faculty of Environment and Technology,Department of Geography and Environmental Management,Civil Engineering Cluster,University of the West of England,Bristol, BS16 1QY, UK

b Fugro GB Marine Geotechnical Services Limited, Wallingford, OX10 9RB, UK

Keywords:Soil classification Physico-chemistry Soil plasticity Machine learning Logistic regression (LR)Machine learning ensembles Artificial neural network (ANN)

ABSTRACT This study has provided an approach to classify soil using machine learning.Multiclass elements of stand-alone machine learning algorithms (i.e.logistic regression (LR) and artificial neural network(ANN)), decision tree ensembles (i.e.decision forest (DF) and decision jungle (DJ)), and meta-ensemble models(i.e.stacking ensemble (SE) and voting ensemble (VE)) were used to classify soils based on their intrinsic physico-chemical properties.Also, the multiclass prediction was carried out across multiple cross-validation(CV) methods, i.e.train validation split (TVS), k-fold cross-validation (KFCV), and Monte Carlo cross-validation (MCCV).Results indicated that the soils’ clay fraction (CF) had the most influence on the multiclass prediction of natural soils’ plasticity while specific surface and carbonate content (CC)possessed the least within the nature of the dataset used in this study.Stand-alone machine learning models(LR and ANN)produced relatively less accurate predictive performance(accuracy of 0.45,average precision of 0.5, and average recall of 0.44) compared to tree-based models (accuracy of 0.68, average precision of 0.71,and recall rate of 0.68),while the meta-ensembles(SE and VE)outperformed(accuracy of 0.75, average precision of 0.74, and average recall rate of 0.72) all the models utilised for multiclass classification.Sensitivity analysis of the meta-ensembles proved their capacities to discriminate between soil classes across the methods of CV considered.Machine learning training and validation using MCCV and KFCV methods enabled better prediction while also ensuring that the dataset was not overfitted by the machine learning models.Further confirmation of this phenomenon was depicted by the continuous rise of the cumulative lift curve (LC) of the best performing models when using the MCCV technique.Overall, this study demonstrated that soil’s physico-chemical properties do have a direct influence on plastic behaviour and, therefore, can be relied upon to classify soils.

1.Introduction

Soil classification is considered as one of the fundamental steps in the planning, design, development, and implementation of various infrastructural projects and related undertakings that have to do with static and dynamic interactions with the ground(Kaliakin, 2017).This is mainly because the results obtained from classifying soils could serve as valuable indices in determining mechanical properties,such as compressibility,permeability,shear strength, and swelling.Depending on the aim or type of construction project envisaged, the class or category of soils required may vary as a result of either their plasticity properties or particle sizes.

Two major groups of soil classification systems are usually adopted in the practice of soil science and engineering.The first category often relies on textural classification based mainly on soil particle size distribution (PSD) originally proposed and standardised by the United States Department of Agriculture (USDA).In the second category, the engineering behaviour of soils is mainly considered;hence,classification is based on both PSD and plasticity properties.The methods and procedures of soil groupings in this case are all captured and standardised in the American Association of State Highway and Transportation Officials(AASHTO)as well as the unified soil classification system (USCS).Moreover, it is also imperative to add that several researchers are continuouslydevising other means of classifying soils (Moreno-Maroto et al.,2021).

Although the classification of soils through their textural characteristics is important, particle sizes do not completely represent the general properties of soils because soils having similar PSD can possess different physical characteristics (Casagrande, 1948;Moreno-Maroto et al., 2021).Therefore, the focus of this article is the classification of soils based on their plasticity behaviour.Plasticity is simply defined as the capacity of soil to be moulded into any shape without rupturing or cracking.Soil plasticity is mostly established by carrying out the Atterberg limits test and, more specifically, by determining the liquid limit (LL) and plastic limit(PL),representing the upper and lower moisture content values for which the soil can exhibit plasticity.

Most previous systems of soil classification are based on an indirect estimation and correlation of LL,PL,and plasticity index (PI)on charts, which in some situations are subject to a few setbacks most especially in borderline cases (Casagrande, 1948; Saito and Miki, 1975; Polidori, 2007, 2009; Moreno-Maroto et al., 2017;Moreno-Maroto and Alonso-Azcárate,2018).These techniques rely mostly on the activities of either first remoulding or working the soil and then testing or laborious physical measurements in order to indirectly infer soil classifications (Cai et al., 2011; Bol, 2013;Shahri et al.,2015;Eyo et al.,2019).Pham et al.(2021)attempted a classification of soils by correlating properties such as specific gravity, moisture content, void ratio, clay content, and Atterberg limit with their plasticity categories.Even though the accuracy of prediction had some improvements, their model indicated that about 12.5%of the soil samples were neither captured nor properly identified.Hence, in addition to the challenges mentioned above,some of the problems encountered in their method of soil classification could have been as a result of not considering the core intrinsic properties of the soils.

Soil plasticity is affected mainly by mineralogy and chemical compositions (Polidori, 2015; Spagnoli et al., 2017; Okeke et al.,2021).According to Moreno-Maroto et al.(2021), these factors can provide even more information on the plasticity behaviour of the soils than those stated previously.Besides, studies on the relationships between these factors and engineering behaviours,such as shear strength, swelling, compressibility, and compaction, have been reported (e.g.Christidis, 1998; Al-Rawas, 1999;Yilmaz, 2004; Abbey et al., 2019, 2020, 2021; Eyo et al., 2019,2020, 2021).In view of this and the aforementioned shortcomings, this article aims to provide a novel approach to soil classification based on a consideration of their physico-chemical characteristics (i.e.cation exchange capacity (CEC), carbonate content (CC), specific surface area (SSA), and clay fraction (CF))using machine learning.

Machine Learning is artificial intelligence (AI) paradigm that is gradually gaining attraction and popularity within the civil engineering discipline.However, the focus of machine learning technique application in geotechnical engineering at present tends to be mostly on regression-based problems related to soil engineering macro-behaviours,such as shear strength,unconfined compressive strength, resilient modulus of elasticity, compressibility, swelling,compaction, and stabilisation (e.g.Kayadelen et al., 2009; Ikizler et al., 2010; Liao et al., 2011; Tekin and Akbas, 2011; Yilmaz and Kaynar, 2011; Bekhor and Livneh, 2014; Tinoco et al., 2014;Mozumder and Laskar,2015;Zhang and Goh,2016;Goh et al.,2017;Mozumder et al., 2017; Soleimani et al., 2018; Gajurel et al., 2019;Ermias and Vishal, 2020; Hanandeh et al., 2020; Eyo and Abbey,2021; Zhang et al., 2020, 2021).

In summary,the major objective of this study is the application of stand-alone machine learning algorithms (logistic regression(LR) and artificial neural network (ANN)), decision tree ensembles(decision forest(DF)and decision jungle(DJ)),and meta-ensemble models (stacking ensemble (SE) and voting ensemble (VE)) to the soil classification.Unlike regression, the task of machine learning classification is to employ a decision-making technique to the class membership of an unknown item of data based on the entirety of the dataset(Dreiseitl and Ohno-Machado,2002;Pham et al.,2021).For the nature of soil classification envisaged, with the predictor variable containing potentially more than two categorical features(due to wide soil plasticity ranges),multiclass elements of machine learning models adopted herein shall be used in the soil classification.Also, prediction shall be carried out across multiple crossvalidation (CV)methods for the first time.

2.Methodology

2.1.Database generation and integration

Both numerical and categorical high-quality data collected from careful experiments on soils(both clayey and silty soils)with wideranging plasticity properties were used in this study.The soil classes represented by the USCS are based on the Casagrande chart of PI versus LL(see Fig.1).The dividing line(or‘A-Line’)expressed as PI=0.73(LL-20)is the line separating inorganic clays,located above the ‘A-Line’, and the rest of the soil materials (i.e.organic soils, silts, and fine sandy soils) located below.A series of symbols that allow the soils to be categorised or classified into different groups are further explained in Table 1.It should be noted that the soils used in this study are those of inorganic clayey and silty nature.Highly standardised measurement techniques and testing procedures used for determining physico-chemical properties of the soils were adopted (e.g.Smith et al., 1985; Dexter, 1990;Abduljauwad, 1994; Cerato, 2001; Kalkan and Akbulut, 2004;Sridharan and Gurtug, 2004; Senol et al., 2006; Arnepalli et al.,2008; Venkat et al., 2008; Yazdandoust and Yasrobi, 2010;Gaidzinski et al., 2011; Ngun et al., 2011; Erzin and Gunes, 2013;Bayat et al.,2015;Mahmoudi et al.,2016;Mehta and Sachan,2017;Akgün et al., 2018; Spagnoli and Shimobe, 2019).The physicochemical properties of the soils (i.e.CEC, CC, SSA, and CF (<2 μm)) derived from these measurements shall serve as inputs or independent variables for machine learning prediction.To achieve the goal (i.e.determining soil classifications in terms of their plasticity), Atterberg limits of the soils were appropriately used to depict the machine learning predictor categorical variables (soil classes).Table 2 shows some of the important components of descriptive statistics obtained from the raw and independent soil data features.Also, Table 2 confirms the wide plasticity ranges of the soils used in this study.The values of LL range between approximately 18%and 155%,while that of PI range between about 1% and 94%, indicating that soils of low and high plasticity are represented according to the USCS.Frequency distributions of the raw data collected from the literature are depicted in Fig.2.Although the data are quite sparse in some bin counts, especially for SSA and CEC,CF seems to possess a more uniform distribution.Among these features,both CC and CF seem to have less deviation from the mean of the values.

Table 1Casagrande-USCS description of soil plasticity.

Table 2Relevant statistics of machine learning independent features and soils’consistency limits.

Jurgen, his father, and several other fishermen and their managersinhabited the same hut; Martin lived in the next one.One of the girls, whose name was Else, had known Jurgen fromchildhood; they were glad to see each other, and were of the sameopinion on many points, but in appearance they were entirely opposite; for he was dark, and she was pale, and fair, and had flaxen hair, and eyes as blue as the sea in sunshine.

Fig.1.Casagrande-USCS plasticity chart(Reprinted from Soil Mechanics:Calculations,Principles, and Methods, Victor N.Kaliakin,Example problems related to soil identification and classification, Copyright (2017), with permission from Elsevier).

2.2.Data wrangling and CV

The data used in the supervised machine learning classification in this study do not contain any missing value.Although it is sometimes considered essential (depending on the variability of the dataset) to transform the dependent features before training and validation, it is unnecessary given these data features are categorical in nature.However, it is important to employ the technique of CV to ensure an improvement of the training dataset as well as a drastic reduction of the possibility that some coincidental variables receive more importance in the multiclass prediction(Joshi,2020).Also,by using this method,overfitting of the machine learning models could be avoided,and the predictive performance is enhanced (DeRousseau et al., 2019).The techniques of CV adopted in this study are train-validation split (TVS), k-fold crossvalidation (KFCV), and Monte Carlo cross-validation (MCCV).

TVS ensures that the dataset is radically and randomly split with each of the dataset used for training,testing,and validation.In this study,80%of the dataset from the source were utilised to effectively train the machine learning multiclass algorithms, whereas the remaining 20% were used to test and validate the predictive performance of the models.This method is based on the recommendations from most machine learning studies that the validation set should be 10%-30%of the total dataset(Han et al.,2020;Joshi,2020).

The KFCV method divides the entire dataset (NKFCV) into k subsets with equal size.One of the subsets is used for training,while the remaining ones are for validation(nv).The process is then repeated for k times while excluding the k-subset in each iterative cycle (see Eq.(1)).Ten-fold CV (i.e.k = 10) is used in this research for the training and validation of the classification data.

MCCV technique combines the concepts of TVS and KFCV methods.MCCV divides the entire dataset (NMCCV) into two sets through sampling without replacing one of the datasets.The training is then carried out on the non-replaced subset (nt), while validation is performed on the remaining ones(nv).The MCCV may preclude the running of iterations,unlike the KFCV(see Eq.(2)).In this study, the proportion of validation data is 20%, in addition to ten-fold CV.

3.Machine learning algorithms

3.1.LR

LR is based on the least square function of linear regression,where several correlations between one or more independent (or explanatory) variables and predictor (or dependent) variables are established by

where Ynis the predictor variable; X1n, X2n, …, Xmnare the independent variables; β0is a constant; and β1, β2, …, βmare the coefficients of regression;and εnis the error term.

On the other hand, LR tends to determine the conditional probability that the predictor variable Y=1 given the independent variable X.The probability distribution for LR is generally given by the hypothesis function:

LR then tends to model the logarithm odds of an event linearly as follows:

where the solution for the even occurring p(x)is given as

where true positive (TP) is the total number of collated instances from a positive class given that the true class label is equal to the predicted class label,false positive(FP)is the total number of collated instances from a negative class where the machine learning algorithm has been known to misclassifythese instances by falsely predicting them as positive, true negative(TN)is the total number of collated instances from a negative class whereby the true class label is known to be equal to the predicted class label, false negative (FN) is the total number of collated instances from a positive class whereby the machine learning algorithm has been known to misclassify these instances by predicting them as negative instead.

For every single data point, there exists some feature vectors xias well as a class of observation yiwhose probability is either p (if yi= 1) or 1 - p (if yi= 0) given the likelihood of the event as

He had hardly gone a few steps when he heard a sound behind him, and, looking round, he saw a carriage made of cardboard, drawn28 by six big rats, coming towards him

3.2.ANN

ANN is a type of machine learning algorithm that is modelled after the human brain.Hence,the functioning and problem-solving processes of an ANN mimic that of the human fundamental neural network unit called the neuron (Fig.3a).The neuron is the brain’s simple information processing unit that is created to receive and process input signals generated from other surrounding neurons(connected at a junction called synapsis) through an input path referred to as the dendrite to an output path called the axon.It is worth mentioning that although the ANN structure is based on the human nervous system,which is said to possess billions of neurons,the ANN does utilise only a few hundreds of its neurons in practical geotechnical engineering problems (Das, 2013).

The neurons are described as processing nodes or elements in the ANN’s mathematical model.Hence,a network having an input vector consisting of a single element xl(l = 1, …, Ni) would be transmitted through some form of connections that are multiplied by a set of weights to produce the hidden unit zjas

where Niis the number of input units;Nhis the number of hidden units consisting of weighted inputsand bias(bj0);wjlis weight;xlis input.

In order to allow for nonlinearity in the network system, the inputs would have to pass through a layer of some transfer (activation) function, f which then generates:

Fig.3b depicts a typical three-layer ANN architecture with the input, hidden,and output layers.

3.3.Tree-ensemble models

Decision tree models are based on a concept that involves repeated or continuous splitting (branching) of input data by adhering to a set of formal rules or criteria that aims to maximise the separation of these data, hence, resulting in a typical tree-like structure (Fig.4).Each splitting of the input data leads to a decrease in the system’s entropy and a maximisation of the splits.This formal rule(or information gain criterion)is common in most decision tree networks.The estimate of the probability distribution P(m|n)in this case would be the ratio of m class elements to all the elements of the leaf nodes containing data item n (Dreiseitl and Ohno-Machado, 2002).However, depending on the kind of problem or application,there are some differences in how the decisiontree machine learning models are constructed or used.For the sake of this research, the multiclass elements of both DF and DJ are considered.

Trint returned to the counter and ate his biscuits and gravy. He gave the waitress a twenty?dollar tip and told her merry Christmas. She said the money was too much, but he told her to use it to buy some books for school, and she took it and slipped him a piece of paper.

3.3.1.DF

The DF as a tree ensemble is typically created in order to minimise the fluctuations or instability which would have possibly existed if a solo tree structure has been utilised in machine learning prediction.The multiclass classification DF relies on the concept of bagging or ‘bootstrapping aggregation’ to perform its function efficiently.Bagging,in this case,is defined simply as a technique of training a dataset by bringing together multiple tree algorithms in a‘bag’(Kang et al.,2021).One of the main setbacks of using a DF is its susceptibility to overfitting due to its various small biases and wide variance.

3.3.2.DJ

DJ is regarded sometimes as some form of an extension of the random forest.DJ comprises an aggregation or ensemble of deeprooted decision directed acyclic graphs (DAGs), which is a technique that ensures that compact and precise machine learning classifiers are obtained(Shotton et al.,2013).Because the merging of several trees is performed, the decision from the DAG typically tends to possess a low memory footprint,thus making them rather phenomenal in their general performance.The multiclass element of DJ has the advantage of being non-parametric and thus can effectively represent nonlinear decision boundaries.Moreover, DJ can select some integrated features and perform classification problems while also be robust in resisting noisy features during data training.

3.4.Meta-ensembles

(2) Precision,which is the positive predictive value defined as the total number of predictions that are regarded as being actually corrected out of all the predictions considered to have been made on positive classifications which is expressed as

After seeing them playing there for a long time, the “Club Palomino” closed down. Yes, the club had been sold. Wouldn’t you know it a huge townhouse（，） sub-division was put up and the club was gone. So were all the bands and my favourite band, “Cheyenne”.

3.4.1.VE

This method of machine learning may not necessarily require any prerequisite from the aggregated classifiers or models.Hence,in most cases, the method of voting neither relies on any assumption of prior knowledge of how individual model behaves nor does it require training on very large quantities of results of representative recognitions from the classifiers (Kim and Upneja,2021).Weighted voting combining the best weights from the above-mentioned models is proposed for the multiclass problem in this study (the structure is shown in Fig.5a).

If I have taken the Golden Blackbird, it is only that it may cure my father, who is ill, and I have travelled more than seven hundred miles in order to find it

Fig.2.Frequency distributions of machine learning features and soils’consistency limits:(a)CC,(b)CEC,(c)SSA,(d)CF,(e)LL,(f)PI,(g)PL,and(h)Distribution of soil class labels.

3.4.2.SE

SE uses two-level models or classifiers and includes both base learners and meta-classifiers for machine learning predictions.SE tends to combine the outputs produced from the base learners by using meta-classifiers to learn the patterns or relationships between the generated outputs.Fig.5b presents the structure of the SE as proposed, which was used in this study for multiclass classification of the soils based on their physico-chemical characteristics.

3.5.Machine learning model development and implementation

All the processes of machine learning training and validation were implemented through a designer platform that supports programming with python with its libraries (pandas, NumPy,Matlib plots, and scikit-learn) and pipeline development.The dataset records utilised in this study for multiclass classification of the natural soils, the fundamental specifications of some of the important features, and the parameters adopted for both the soils and machine learning models are given in Tables 3 and 4.A flowchart depicting the methodology in this study for the development of machine learning pipelines, subsequent evaluation, and the deployment of the models is shown in Fig.6.

To enable comprehensive analysis and evaluation of the soils’plasticity categories generated from multiple sources, thorough integration and alignment of the data were imperative.All the datasets were appropriately appended by putting together the elements with similar attributes and features.An appraisal of the importance of each independent feature by assessing their relative importance in the machine learning prediction before and after the process of data training and validation was conducted and reported in subsequent sections of this study.

Table 3Base machine learning models’ parameter settings.

Table 4Data type of machine learning features.

3.6.Performance evaluation criteria

One day Catherine was sitting in her own room when suddenly the door flew open, and in came a tall and beautiful woman holding in her hands a little wheel

(1) Accuracy, which is regarded as the overall proportion or correct predictions of an machine learning algorithm and is expressed as

Turning to ask the Beast what it could all mean, Beauty found that he had disappeared, and in his place stood her long-loved Prince! At the same moment the wheels of a chariot were heard upon the terrace, and two ladies entered the room

Fig.3.(a) Human neuron and (b) typical structure of ANN.

Models or meta-ensembles are machine learning paradigms that combine various models,some of which are mentioned above in machine learning prediction.The basic idea is to utilise the strength of each model to estimate different patterns in the data.This can be carried out to improve prediction accuracy through the technique of either majority or weighted voting or stacking generalisation depending on the classification problem envisaged.

Your fish’s tail, which amongst us is considered so beautiful, is thought on earth to be quite ugly; they do not know any better, and they think it necessary to have two stout71 props72, which they call legs,24 in order to be handsome

Fig.4.Structure of a typical decision tree.

(3) Recall, which otherwise known as sensitivity, tends to measure the machine learning model’s ability to identify the proportion or percentage of some relevant data points.It can also be taken as the number of collated instances of the correctly predicted positive class.Recall is mathematically expressed as

3.7.Receiver operating characteristic (ROC) curve

A typical class prediction sensitivity analysis to determine the level of acceptability of the best machine learning models in the identification of the TPs belonging to different class labels or categories is through the ROC curve.ROC is a useful probability assessment tool that enables an evaluation of multiclass classification models by demonstrating the relationship between true positive rates (TPRs) and false positive rates (FPRs) during the course of any change in the decision-making threshold.ROC depicts to a great deal how much a machine learning model can properly differentiate between categories or classes.Used along with the ROC, (area under the curve (AUC)), measures the amount of separability between each class in machine learning prediction.For the ROC, an accurate or perfect machine learning classification indicates a point having coordinates of(0,1)at the top left-hand side corner on a plot of TPR versus FPR,which is otherwise regarded as the sensitivity or recall of 100% (i.e.without any false negatives in the prediction) shown in Fig.7.On the other hand, an act of randomly guessing the class or category of a natural soil’s plasticity would generate points on or below the straight line (or line of no possible discrimination) extending from the origin (0, 0) to the upper right-hand side corner regardless of the positive or negative base rates which in turn would represent the worst possible condition.In this case, the ROC curve would be almost equal to 0.5,meaning that the machine learning model has no possiblediscrimination in distinguishing between several positive and negative class labels.

Fig.5.Typical meta-ensemble architecture for voting and stacking models: (a) VE structure, and (b) SE structure.

4.Results and discussions

4.1.Assessment of machine learning models’ performance

Fig.8 depicts the performance of the multiclass machine learning algorithms utilised for the prediction of the soils’plasticity categories according to the USCS.A comparison between the numbers of variables selected in the prediction is also depicted in Fig.8.It is observed that when using all four variables(SSA,CF,CEC,and CC)in the classification problem,both the stand-alone machine learning models (LR and ANN) seem to produce relatively less accurate performance compared to tree-based and meta-ensemble models.Overall, the worst performing model is the LR (accuracy of 0.44, precision of 0.47, and recall of 0.43), with the ANN clearly having the highest accuracy of the two stand-alone models (accuracy of 0.46, average precision of 0.52, and average recall of 0.45).One of the main reasons for LR’s worst performance is its intrinsic assumption of linearity in spite of several instances of collinearities between input and response class labels in the multiclass prediction,which leads to their inability to sufficiently learn the features.The inclusion of more ‘interaction’ product terms to the original independent variables and their covariates can certainly improve the performance of the LR model by causing it to be more nonlinear and robust in the multiclass prediction.However, this approach must be dealt with cautiously because the desire for greater flexibility could be accompanied by a much higher risk of overfitting,reducing further the accuracy of the model (Dreiseitl and Ohno-Machado, 2002).On the other hand, the architectural make-up of the ANN model,as defined previously,means that their perceptron outputs (also referred to as hidden neurons or black-boxes) are inherently nonlinear.Hence, the output of the ANN is a nonlinear function of its inputs which in the context of a multiclass problem would mean that the decision boundary can be nonlinear, thus,making this algorithm relatively more flexible in the multiclass prediction compared to LR.

It is pertinent to add that LR and ANN models are quite similar in many aspects, given that both tend to possess common roots,especially in most statistical problems.Moreover,these models are regarded as being different from the tree-based ensembles,because as explained previously in Section 2, they can supply both a function f, and a parameter vector α to allow an expression of the probability distribution P(y|x) as

However, what makes the LR different from the ANN is the functional form of expression used.Whereas this is often referred to as the parametric functional method when using the LR,that of the ANN is called non-parametric or semi-parametric.This is an important distinction because the contribution from parameters in an LR can be interpreted.At the same time, this may not often be the case for the parameters or weights of a neural network.

The unique structure of the tree-based models makes them inevitably nonlinear,and according to Fig.8,they seem to produce slightly higher performance metrics than the stand-alone models.The decision trees are‘white-box’models,so-called because when compared to neural networks, they tend to allow for an interpretation of model parameters within a set of rules.As explained earlier, the tree-based models work by the continuous splitting of the input data according to sets of criteria or rules, which maximises the separation between the data and a corresponding decrease in entropy.

The need to appropriately classify soils,especially from the point of view of geotechnics field practice, cannot be over-emphasised.This is in most parts because an inaccurate classification of a silty soil as a plastic clay soil could result to undesirable cost overruns that are associated with a remedial measure such as stabilisation or even utter disregard of soils by incorrectly assuming they do not meet required specifications.Consequently, this study has utilised an intensive data-driven decision-making approach through the concept of machine learning to rigorously classify soils.It is believed that the procedures adopted in this study have provided a succinct basis for categorising soils because unlike previous techniques,which relied mainly on the activities of first remoulding the soil and then testing or laborious physical measurements to indirectly infer soil classes, the method followed herein is based on properties that intrinsically affect the natural soil behaviour under in-situ conditions or without having to physically work the soil.Granted, not every conceivable physico-chemical property of the soils is considered in this study.Nevertheless, the fundamental basis for future research has been provided.Accordingly, it isrecommended that soil’s inherent properties not covered in this study should be used to further evaluate and forecast soil classes.Besides, statistical comparisons are also suggested between class predictions based entirely on physical, electrical, or mechanical properties and those relying on soil’s chemical components.

Some of the setbacks mentioned above,especially as it relates to the decision tree ensembles, could be tremendously curtailed by aggregating multiple learners (both the stand-alone and tree ensembles) into what is referred to herein as meta-ensembles in the machine learning training.As could be observed in Fig.8, the performance metrics of the meta-ensembles, most especially the VE model, are high (accuracy of 0.78, average precision of 0.78 and average recall of 0.76).As stated previously,the VE and SE work by a meta-heuristic combination of base algorithms’ hyperparameters such that the outcomes of their prediction are even more accurate.It is quite interesting to note how the stacking technique of the model of algorithm performance (SE) is almost at par with that of the tree-ensembles.Further investigation of this behaviour shall be conducted subsequently through a sensitivity analysis.

4.2.Multiclass prediction uncertainties

Multiclass prediction probability distributions across each of the classes for the machine learning models are plotted in Fig.9.Greater symmetry (or less skewness) in the distribution is exhibited by the ensemble models compared to the stand-alone ones.The stand-alone machine learning models generally possess nonsynchronous means and median scores across the class labels.Moreover, much greater biases in the prediction when using LR aretowards CH and CL classes, whereas the probability of predicting machine learning is higher compared to other classes as demonstrated by ANN.Overall, the probability of correctly predicting the classes is shown by the ensemble models given their relatively higher and balanced averages.

4.3.Evaluation of machine learning feature importance

Fig.6.Machine learning workflow diagram.

Fig.7.A typical ROC curve.

Indicators of the importance of machine learning independent features can give insights into the dataset used while also contributing to an improvement of the machine learning model efficiency.Feature importance is a technique that enables an assignment of scores to input variables of a dataset and indicates the relative significance or usefulness of each variable to the goal of machine learning model prediction.For the sake of brevity, the influence of each input feature or variable across each of the classes is assessed in this study and given in Fig.10 for the best metaensemble model.It is observed that the CF seems to have the greatest overall significance on the multiclass prediction of the natural soils’ plasticity classes in accordance with the USCS.Previous studies have established that CF, among other physicochemical soil properties, does bear a strong relationship with the physical and mineral properties of soil,which in turn are known to affect soil plasticity (Mitchell and Soga, 2005; Nelson et al., 2015).The greatest influence of CEC on the MH soil class is also easily noticeable, along with the importance of SSA (although slightly lower than that of CF)in predicting CL soil classes.Moreover,it was recently confirmed by some probabilistic analyses that a soil’s LL is most likely to be directly proportional to SSA and CEC(Spagnoli and Shimobe, 2019).The overall importance of SSA and CC in deciding the soil classes are not easily distinguishable, even though the influence of the later appears to be the least when evaluated cumulatively compared to the other features.

Fig.11 depicts the performance of the VE model by comparing the predicted values of the classes to their true or actual values.As could be observed, VE slightly over-predicted the CH (by approximately 10%)and CL(by approximately 17%)plasticity classes while under-predicting the machine learning (by approximately 30%)plasticity class.It is also clear from Fig.11 that the MH class was most accurately predicted by the VE model.Fig.12 depicts the density plot of feature importance in terms of the cumulative average of absolute values.The means of the distribution are higher for CF followed by CEC, CC and then SSA, hence, confirming the level of importance of these features when considered in that order.Also,a more symmetric distribution of the density plots about their mean or average value of importance is observed for the features with the highest level of influence in the multiclass prediction.

4.4.Sensitivity analysis of the best performing models

The meta-ensemble models used in the prediction did produce the best predictions as observed from the foregoing.However,given that a multiclass classification problem is being addressed in this study, it is imperative that the classifier boundaries betweenthe plasticity categories of the natural soils are assessed with respect to a threshold value using the ROC curve.

Then he went down into the garden, and though it was winter everywhere else, here the sun shone, and the birds sang, and the flowers bloomed, and the air was soft and sweet

On the third day the messenger returned and announced: I have not been able to find any new names, but as I came upon a high hill round the corner of the wood, where the foxes and hares bid each other good-night, I saw a little house, and in front of the house burned a fire, and round the fire sprang the most grotesque8 little man, hopping9 on one leg and crying: To-morrow I brew10, to-day I bake, And then the child away I ll take; For little deems my royal dame11 That Rumpelstiltzkin is my name! You can imagine the Queen s delight at hearing the name, and when the little man stepped in shortly afterward12 and asked: Now, my lady Queen, what s my name? she asked first: Is your name Conrad? No

Fig.8.Machine learning performance scores for all models.

Fig.9.Prediction probability scores for all models: (a) LR; (b) ANN; (c) DF; (d) DJ; (e) VE.

Fig.10.Feature importance of multiclass prediction for VE model.

Fig.11.True versus predicted values for each soil class for the VE model.

An analysis of the sensitivity of the meta-ensemble models considered with all four input variables (SSA, CF, CEC, and CC) for both VE and SE using the method of TVS is indicated in Fig.13.An examination of Fig.13 indicates the ability of the meta-ensemble models to distinguish between the soils’ class labels.The ROC curve of VE model seems almost indistinguishable from that of SEmodel,especially when the multiclass AUC metric of both models is considered.However, when considered in terms of the machine learning score of accuracy,the VE model seemed to outperform SE model by about 10%, as observed previously.However, it is quite revealing through the ROC curve in Fig.13 that reliance on a single metric score alone to determine the performance of the ensembles may be not entirely sufficient.Notwithstanding,further diagnostic analyses of the performance of the meta-ensembles across different CV methods used in the training and subsequent validation of the dataset are discussed below.

Fig.12.Average absolute values of feature importance: (a) clay, (b) CEC, (c) CC, and (d) SSA.

Fig.13.ROC for VE and SE models using TVS method.

4.5.Comparison between different CV techniques

As mentioned previously, both KFCV and MCCV methods could be applied to improve the prediction of machine learning analyses while also serving as fine-tuning mechanisms to the more conventional TVS method.Although, when considered in terms of their bias-variance trade-offs, the MCCV is mostly deemed as having greater biases than the KFCV but with the former seeming to provide slightly more confidence in machine learning predictions given that it is more repeatable than the later due to its capacity to provide results with lower degrees of variance.As indicated in Fig.14, the predictions offered using all three methods of training, testing, and validation of the dataset used in this study by adopting the meta-ensemble models do not appear to show any difference in the resulting ROC curves of the VE model.Hence, this does confirm that the multiclass machine learning models used predominantly for training and testing under TVS did not overfit the data, although the resulting metrics for using this technique are slightly higher according to Table 5.On the other hand, when using the SE model for the multiclass machine learning prediction, Fig.14 and Table 5 demonstrate that the outcome is only slightly better when the dataset is trained,tested, and validated using MCCV and KFCV methods, albeit with the former showing a greater ability to distinguish between the positive classes.Table 5 also shows that the assignment of‘weights’ to cater for any class imbalance or under-represented class instances does improve the multiclass accuracy metric though only slightly.

Table 5Machine learning metric scores for the meta-ensemble models.

Fig.14.ROC curves for (a) VE and (b) SE models.

4.6.Lift curve

The lift curve(LC)could provide another means for checking the sensitivity of the meta-ensemble models by lending more credence to their effectiveness in the multiclass prediction of the soil’s plasticity class.Again, this is because the prediction that could be achieved from a somewhat ‘random’ model or through an act of guessing may be incorrect compared to a very good model with a greater percentage of datasets.LC is visually represented as a curve of cumulative gain ratio for any random model.Fig.15 indicates no difference in the LC when applying the KFCV, MCCV, and TVS techniques to predict the multiclass soil labels,confirming no form of overfitting for dataset training carried out using the VE model.Overall, even though the highest lifting as noticed throughout the entire percentile range occurred with training, testing, and validation carried out with the SE model using the MCCV method,the same amount of lifting(at four)was observed for the three dataset validation methods.

This study utilises the ‘discrimination’ criterion to assess the quality of the classification algorithms.Discrimination simply measures how well multiple categories or class labels in a dataset are separated.The following discrimination-based performance evaluation metrics are used in this study:

5.Research significance and recommendations for future study

However,Fig.8 also shows the relative lower performance of the tree-based ensembles (DJ and DF) compared to the metaensembles, most especially the VE model (which does outperform the rest of the models).One of the main drawbacks of decision tree algorithms can be traced to their somewhat ‘greedy’ behaviour at each successive step of construction during the actual training process.A combination of the best variables and an optimal splitting of the same are obtained at each step.However, a single progressive construction or feed-forward step that utilises a combination of the variables may end in a much better or different result while compromising the rest of the other steps.Another serious disadvantage in the decision tree algorithms would be that the continuous variables of the dataset could be discretised implicitly in the splitting process, such as losing their importance and some of the needed predictive information as the construction of the leaves progresses along.

Fig.15.Machine learning cumulative LCs for (a) VE and (b) SE models.

Meanwhile, for practical deployment and application of the concepts and ideas advanced in the present research,the resources,models, and their predictions, including the developed machine learning background codes utilised, can be carefully saved on an organisation’s server, software and hardware assets and the best models reloaded,and predictions made for new soil data for in the classification problem.This procedure can be best implemented during the preliminary phases of a geotechnical site investigation and design.

6.Conclusions

In this study, a novel approach of soil classification based on their physico-chemical characteristics was carried out using the concept of machine learning.Multiclass elements of stand-alone machine learning algorithms, decision tree ensembles and metaensemble models were employed to classify soils according to their plastic behaviours.Machine learning predictions were also performed on the best models across multiple CV methods.The following are the important highlights from this study:

No sooner had he scattered14 them over the paths and walls of the King s garden than they became one blaze of glittering gold, so that everyone s eyes were dazzled with the brilliancy, and everyone s soul was filled with wonder

(1) CF had the highest influence on the multiclass prediction of the natural soils’ plasticity in accordance with the USCS,while SSA and CC possessed the least significance within the context of this study and the nature of the dataset used.

Then she spoke: Wind, wind, gently swayBlow Curdken s hat awayLet him chase o er field and woldTill my locks of ruddy goldNow astray and hanging downBe combed and plaited in a crown

(2) Stand-alone machine learning models (LR and ANN) produced less accurate predictive performance compared to tree-based and meta-ensemble models.Overall, the least performing model is the LR (accuracy of approximately 0.436), with the ANN clearly having the highest accuracy of the two stand-alone models (accuracy of approximately 0.462).

(3) Tree-based ensemble models(DJ and DF)gave less accuracy(accuracy of 0.68,average precision of 0.71,and recall rate of 0.68)when compared to the meta-ensemble models(SE and VE).Overall, the meta-ensemble models, as could be observed, outperformed the rest of the models (accuracy of 0.75, average precision of 0.74, and average recall rate of 0.72).

(4) Sensitivity analysis of the meta-ensemble models proved their capacities to discriminate between soil classes across different CV models considered.Machine learning training and validation using MCCV and KFCV methods enabled better prediction while also ensuring the dataset were not overfitted by the machine learning models.

(5) Further confirmation of this phenomenon was depicted by the continuous rise of the cumulative LC of the best performing models when using the MCCV technique.Overall,this study demonstrated that soil’s physico-chemical properties do have a direct influence on plastic behaviour and,therefore, can be relied upon to classify soils.

Declaration of competing interest

1. Forest: The forest is a recurrent image in German fairy tales, in part because over a quarter of the country is comprised of forest land. In the Grimms tales, the forest is a supernatural world, a place where anything can happen and often does.

The authors wish to confirm that there are no known conflicts of interest associated with this publication, and there has been no significant financial support for this work that could have influenced its outcome.

Acknowledgments

The authors hereby wish to offer their thanks and deep appreciation to the reviewers for their thoughtful comments,suggestions and efforts towards improving our article.

Journal of Rock Mechanics and Geotechnical Engineering2022年2期

Journal of Rock Mechanics and Geotechnical Engineering的其它文章: Poroelastic solution of a wellbore in a swelling rock with non-hydrostatic stress field; Microwave response characteristics and influencing factors of ores based on dielectric properties of synthetic samples; A novel approach to structural anisotropy classification for jointed rock masses using theoretical rock quality designation formulation adjusted to joint spacing; Influence of blasting load directions on tunnel stability in fractured rock mass; A new multiple-factor clustering method considering both box fractal dimension and orientation of joints; Generic creep behavior and creep modeling of an aged surface support liner under tension