Metaheuristics with Deep Learning Empowered Biomedical Atherosclerosis Disease Diagnosis and Classification

2022-08-24 06:59AreejMalibariSiwarBenHajHassineAbdelwahedMotwakelandManarAhmedHamza

Computers Materials&Continua 2022年8期

Areej A.Malibari,Siwar Ben Haj Hassine,Abdelwahed Motwakel and Manar Ahmed Hamza,＊

1Department of Industrial and Systems Engineering,College of Engineering,Princess Nourah Bint Abdulrahman University,Riyadh,11671,Saudi Arabia

2Department of Computer Science,College of Science and Arts,King Khalid University,Mahayil,Asir,Saudi Arabia

3Department of Computer and Self Development,Preparatory Year Deanship,Prince Sattam bin Abdulaziz University,Al-Kharj,16278,Saudi Arabia

Abstract: Atherosclerosis diagnosis is an inarticulate and complicated cognitive process.Researches on medical diagnosis necessitate maximum accuracy and performance to make optimal clinical decisions.Since the medical diagnostic outcomes need to be prompt and accurate,the recently developed artificial intelligence(AI)and deep learning(DL)models have received considerable attention among research communities.This study develops a novel Metaheuristics with Deep Learning Empowered Biomedical Atherosclerosis Disease Diagnosis and Classification (MDL-BADDC) model.The proposed MDL-BADDC technique encompasses several stages of operations such as pre-processing,feature selection,classification,and parameter tuning.Besides,the proposed MDL-BADDC technique designs a novel Quasi-Oppositional Barnacles Mating Optimizer(QOBMO)based feature selection technique.Moreover,the deep stacked autoencoder(DSAE)based classification model is designed for the detection and classification of atherosclerosis disease.Furthermore,the krill herd algorithm(KHA)based parameter tuning technique is applied to properly adjust the parameter values.In order to showcase the enhanced classification performance of the MDL-BADDC technique,a wide range of simulations take place on three benchmarks biomedical datasets.The comparative result analysis reported the better performance of the MDL-BADDC technique over the compared methods.

Keywords: Atherosclerosis disease;biomedical data;data classification;machine learning;disease diagnosis;deep learning

1 Introduction

Cardiovascular disease (CVD) is a common term for a multitude of heart illness conditions and disorders.There is another form of CVD,particularly coronary artery disease(CAD),so called atherosclerosis[1].The number of persons are impacted by heart disease,particularly atherosclerosis.This disease is the major cause of mortality in many nations as per the World Health Organization(WHO).For clinical diagnoses,automatic extraction of data from individual records is problematic[2].Henceforth,the significance of developing and establishing a Medicinal Diagnostic Support Scheme (MDSS) for automating patients’prediction and classification of CVD.But healthcare diagnoses study needs higher efficiency and accuracy for making a better medical decision.Although classical MDSS has demonstrated the ability to cover many diagnosis problems,they provide a low precision rate and could not offer accurate diagnoses [3].In the past few decades,clinical therapy and diagnoses schemes utilizing Machine Learning (ML) and artificial intelligence (AI) techniques have received much recognition.Therefore,this study topic has influenced academic fields namely applied sciences,finances,medical,and biology applications.Subsequently,various studies were introduced for developing MDSS to classify or predict patients with CVD to enhance healthcare[4].The abovementioned methods predict the existence of disease by using statistical models that need tested objects which should meet the precondition of the models,like logistic regression,time series models,etc.for evaluating the occurrence of disease[5].

Current researches have employed machine learning(ML)methods for diagnosing distinct CVD issues and making a calculation.Fig.1 illustrates the applications involved in computer aided healthcare.A main challenge of ML is the higher dimension of the data set[6].The study of various features needs a massive number of storage and results in over-fitting;hence the weighting feature decreases processing time and unwanted information,therefore enhancing the efficiency of the model [7].Finding a smaller set of features describes distinct diseases of medical images,health management,IoT,and genome expression.Reduction Dimension employs feature extraction to simplify and transform information,whereas feature selection decreases the data set by eliminating unwanted features.

Figure 1:Applications of computer aided healthcare

He et al.[8]presented an evolutionary classification method.The fundamental of the predictive method is a kernel extreme learning machine(KELM)enhanced using salp swarm algorithm(SSA).For getting a good set of features and parameters,the space transformation method is presented from the optimization for improving SSA to obtain an optimum KELM method.Terrada et al.[9]determined an MDSS of CAD disease.This method is capable of giving heart disease predictions via the patient medical information.This MDSS is depending on ML methods like k-means clustering and K-medoids for K-Nearest Neighbor(KNN),classification,and Artificial Neural Network(ANN)for forecasting the absence and presence of Atherosclerosis diseases.

Terrada et al.[10]apply KNN and ANN for predicting patients with or without CVD disease.The method is authorized on Hungarian,Cleveland heart disease,Long Beach VA,and Switzerland datasets.This MDSS is depending on supervised ML models.Munger et al.[11]aim at the present application of ML for providing insights into the atherosclerotic plaque formation and good understanding of atherosclerotic plaque evolution in patients with CVD.

Zhao et al.[12]presented an automated multiclass coronary atherosclerosis plaque classification and detection architecture.First,recovered the transverse cross section and centrelines from the CT angiography.Next,extract the ROI according to the coarse segmentation.Then,extract a random radius symmetry(RRS)feature vector that integrates various determinations into a random approach and significantly increases the trained data.Lastly,fed the RRS feature vectors to the multiclass coronary plaque classifiers.

Parameswari et al.[13]aims at decreasing the disease independent variation without damaging data-based variances among the atherosclerotic eyes and images of healthy.The presented approach allows improvement of illumination in the blood vessel,via renovating them.Lastly,Enhanced Bayesian Arithmetic Classifier(EBAC)is executed for efficient classifier.Cherradi et al.[14]presented a CAD scheme based on KNN and ANN models.Next,employed K-fold cross-validation for splitting the datasets and attaining the optimal method with greater precision and lesser results.

This study develops a novel Metaheuristics with Deep Learning Empowered Biomedical Atherosclerosis Disease Diagnosis and Classification(MDL-BADDC)model.The proposed MDLBADDC technique designs a novel Quasi-Oppositional Barnacles Mating Optimizer(QOBMO)based feature selection technique.In addition,the deep stacked autoencoder (DSAE) based classification model is designed for the detection and classification of atherosclerosis disease.Finally,the krill herd algorithm(KHA)based parameter tuning technique is applied to properly adjust the parameter values.To showcase the enhanced classification performance of the MDL-BADDC approach,a wide range of simulations take place on three benchmarks biomedical datasets.

2 The Proposed Model

Figure 2:Overall process of MDL-BADDC technique

2.1 Data Pre-Processing

At first,the preprocessing method takes place for the conversion of non-traditional data set into traditional data set for improving the performance of the presented method.For this,min-max normalization method is performed.NN training is developed an effective on the achievement of preprocessing phase on the network targets inputs.The normalization process for raw input has a better result on making the data that suitable for training[15].Generally,the feature is being rescaled to be in the interval of[0,1]or from[-1,1].

In which(ymax-ymin)=0;when(xmax-xmin)=0 to a feature,it designates a constant rate for that feature in the data.When the feature value is identified with a constant value,it should be unconcerned since it doesn’t transport any data to NNs.After the min-max normalization has been performed,all the features would be in the novel range of value that remains unchanged.

2.2 Design of QOBMO Based Feature Selection

At this stage,the pre-processed biomedical data is passed into the QOBMO algorithm to choose an optimum subset of features.A barnacle is a microorganism that attaches itself to object in the water.The mating groups comprise each neighbor and competitor within reach of the penis.BMO is stimulated by the mating procedure.With simulates initialization,selection,and reproduction processes the realtime optimization issue was resolved[16]:First,considered the candidate solution is barnacle,in which the matrix of the population is formulated by Eq.(2).The calculation of population and sorting procedure are performed for locating the optimal solution atX.

WhereasNrepresent the barnacle population count,nindicates the amount of control parameters andbarnacle-dandbarnacle-mrepresent the parent that mated.As there is no certain equation to derive the reproduction method of barnacles,BMO emphasizes the genotype frequency of parent to yield of springs according to the Hardy-Weinberg principles.It is noteworthy that the length of the penis(pl)plays a significant part in defining the exploration and exploitation methods[17].

whereasprepresent the random distribution numbers from the range of zero and one,q=(1-p),andrepresents the variable ofDadand Mum barnacles.pandqdenote the genotype frequency ofDadand Mum barnacles.When barnacle #1 chooses barnacle #8,it is over the limit.Therefore,the mating procedure doesn’t take place.Now,the offspring are generated by the sperm cast method.

In which rand()denotes the arbitrary values from the range of zero and one.The new offspring is generated by Mum’s barnacles because it attains the sperm that is released into the water by other barnacles.In the iteration,the location of the barnacles is upgraded.At last,the BMO is determined for approximating the global optimum for optimization problems.

The OBL is fundamentally established to the drive of decreasing the computational time and enhancing the convergence capabilities of distinct EA [18].With assuming every of the present population and its opposite population dependent upon OBL,the candidate solution was enhanced.This method is easy and simple for implementing that creates it appropriate for enhancing the efficiency of BMO technique.So,the primary population of this technique was created dependent upon the QOBL technique.Since,the outcome of this comparative,an optimum amongst novel and quasi-opposite solutions was retained from the primary populations.It can improve the variety and exploration of created primary population.Therefore,the technique is typically converged to global optimal with faster rate.The definition of opposite point,opposite number,quasi-opposite point,and quasi-opposite number are provided as follows[19]:

To some arbitrary numberχ∈[a,b],their opposite numberχ0has provided as:

But,the opposite-point to multi-dimension search space(ddimensional)was demonstrated as:

and the quasi-opposite numberxqoof some arbitrary numberχ∈[a,b]is provided as:

Likewise,the quasi-opposite point to multi-dimension search space(d-dimension)has determined as:

2.3 Design of Optimal DSAE Based Classification

During classification process,the optimal DSAE model is utilized for the detection and classification of atherosclerosis disease.The AE is an axisymmetric single hidden-layer neural network(SLNN).The AE encodes as input data by utilizing the hidden layer,approximating the minimal error,and attaining the optimal-feature hidden-state[20].For instance,the AE doesn’t learn realtime features by inputting and copying memory into implicit layer,even though it could recreate input data with higher accuracy.In order to adhesion state of locomotive,kset of monitored information{x1,x2,x3,...,xn}exists,that is recreated into aN×Mdataset{x(1),x(2),x(3),..,x(N)},x(i)∈RM.This data is utilized as input matrix X.In this work,the activation functions of AE are sigmoid,that is developed for attaining a good depiction of input:h(X,W,b)=σ(WX+b).The enforcing sparsity purpose is to reduce the unwanted activation.aj(x)is fixed as thejthactivation values.During the feature learning procedure,the activation values of hidden neuron are formulated bya=sigmoid(WX+b),in whichbindicates the deviation matrix andWrepresent the weight matrix.

The hidden state was retained at a low value to make sure that average activation values of sparse variable are determined byρ,and the penalty term was utilized for preventingρjfrom deviating inρparameter[21].The KL divergence was employed as the base of punishment.

Ifρjdoesn’t deviation inρparameter,the KL divergence value was zero;or else,the KL divergence value would rise progressively with the deviation.

In whichS2indicates the amount of neurons andβdenotes the weight of sparse penalty.Afterward,the sparse penalty was determined,the sparse expression is attained by reducing the sparse cost function.

Antarctic krill is the leading animal species on Earth.The capability to generate huge swarm is most important feature of this species.An individual krill move from the herd if predators like whales,seal,and some another species attacked the herds.This attack decreases the density of KH.The improvement of KH then predation was caused by several parameters.An important purpose of the herd performance of the krill individual is improving krill density and attaining food.KH technique utilizes this multi-objective herd to resolve global optimized issues[22].To determine food(maximum food focus)and density dependent attractiveness of krill’s were utilized as objective.Thus the outcome,a krill individual transfers near-optimum results once it explores to maximum densities of herd and food.This performance generates a KH about the global minimal of optimized issue.

The time-dependent place of individual krill’s from 2Dsurfaces has been led by the subsequent 3 important essential performances.

1.Progress induced by another krill individual;

2.Foraging motion

3.Physical or arbitrary diffusion

The subsequent Lagrangian method was generalizing to n dimension decision space:

whereNirefers the motion induced by another krill’s individual;Fistands for the foraging motion;andDisignifies the physical diffusion ofithkrill’s individual.

The progress of all krill’s individual is determined as:whereNmaksstands for the maximal induced speed,and based on the measured value,it could be obtained as 0.01(m/s).ωndefines the inertia weights of motion induced from the range of zero and one.represents the local effects offered by the neighbors,targetimplies the target way effects offered as an optimum krill’s individual andstands for the final motion-induced.ωn,the inertia weight has equivalent to 0.9 initially optimized.Afterward,it can be linearly reduced to 0.1.

The effects of neighbors are considered as an attraction or repulsion tendency amongst the individuals to a local search.,the target way effects offered by an optimum krill’s individual are determined as[23]:

whereCbestrefers the coefficient of influences and determined as under.

whererandimplies the arbitrarily created number amongst zero and one,Irefers the actual iteration number andImakssignifies the maximal amount of iterations.

3 Performance Validation

The experimental result analysis of the proposed technique takes place using three medical datasets namely Cleveland dataset,Hungarian dataset,and Z-Alizadeh Sani dataset.

The Cleveland dataset has 76 attributes,of that only 14 features are generally utilized in maximum published research:13 inputs and 1 output.During this case,only 303 instances were utilized by 164 healthy subjects and 139 CAD(coronary artery disease)patients.

The Hungarian dataset [24]has 14 features 13 inputs and 1 output.During this case,only 294 instances were utilized with 188 healthy subjects and 106 CAD patients.

Z-Alizadeh Sani dataset[25]is gathered arbitrarily in heart disease patients at Tehran’s Shaheed Rajaei Cardiovascular,Medical and Research Center.This dataset was constructed for CAD diagnosis,having 303 samples by 56 features to all the patients.Classes:71%of patients ensured CAD and 29%were healthy.

The FS results obtained by the QOBMO technique take place using three datasets[26].The results show that the QOBMO technique has chosen 9,8,and 12 features from the test Cleveland,Hungarian,and Z-Alizadeh datasets respectively.

3.1 Result Analysis on Cleveland Dataset

The confusion matrix offered by the MDL-BADDC technique on the test Cleveland dataset is shown in Fig.3.The figure reported that the MDL-BADDC technique has effectually identified the class labels under all epochs.For instance,with 200 epochs,the MDL-BADDC technique has identified 162 samples under Absent class and 135 samples under Present class.In addition,with 600 epochs,the MDL-BADDC method has identified 160 samples under Absent class and 136 samples under Present class.Along with that,with 1000 epochs,the MDL-BADDC approach has identified 163 samples under Absent class and 137 samples under Present class.

Figure 3:Confusion matrix of MDL-BADDC technique on Cleveland dataset

Detailed result analysis of the MDL-BADDC technique on the test Cleveland dataset is depicted in Tab.1.The experimental results stated that the MDL-BADDC technique has accomplished effective outcomes under every epoch.

Table 1:Result analysis of MDL-BADDC technique on Cleveland dataset

For instance,under 200 epochs,the MDL-BADDC technique has obtainedsensy,specy,accuy,Fscore,and MCC of 99.39%,97.84%,98.68%,98.79%,and 97.35%respectively.Eventually,under 600 epochs,the MDL-BADDC methodology has achievedsensy,specy,accuy,Fscore,and MCC of 98.17%,98.56%,98.35%,98.47%,and 96.68% correspondingly.Meanwhile,under 1000 epochs,the MDLBADDC approach has reachedsensy,specy,accuy,Fscore,and MCC of 99.39%,97.12%,98.35%,98.49%,and 96.69%correspondingly.

A comparative result analysis of the MDL-BADDC technique takes place with recent methods in Tab.2.The figure shows that the weighted fuzzy rules(WFR),C4.5,and Fast Detection Tree(FDT)techniques have obtained loweraccuyof 64.25%,79.54%,and 78.75%respectively.Along with that,the Hybrid Neural Network-Genetic(HNNG)and NN models have resulted in moderateaccuyof 89.60%and 85.95%respectively.In line with,the ANN,SVM,and C4.5 techniques have obtained reasonableaccuyof 98.10%and 93.56%respectively.However,the MDL-BADDC technique has outperformed the existing methods with the maximumaccuyof 98.28%.

Table 2:Accuracy analysis of MDL-BADDC technique on Cleveland dataset

3.2 Result Analysis on Hungarian Dataset

The confusion matrix presented by the MDL-BADDC method on the test Hungarian dataset is illustrated in Fig.4.The figure stated that the MDL-BADDC methodology has effectually identified the class labels under all epochs.

Figure 4:Confusion matrix of MDL-BADDC technique on Hungarian dataset

For instance,with 200 epochs,the MDL-BADDC approach has identified 181 samples under Absent class and 98 samples under Present class.Besides,with 600 epochs,the MDL-BADDC system has identified 183 samples under Absent class and 101 samples under Present class.At last,with 1000 epochs,the MDL-BADDC algorithm has identified 181 samples under Absent class and 98 samples under Present class.

A comprehensive outcome analysis of the MDL-BADDC approach on the test Hungarian dataset is illustrated in Tab.3.The experimental outcomes referred that the MDL-BADDC method has accomplished effectual outcomes under every epoch.For instance,under 200 epochs,the MDLBADDC methodology has achievedsensy,specy,accuy,Fscore,and MCC 96.28%,92.45%,94.90%,96.02%,88.91% correspondingly.In addition,under 600 epochs,the MDL-BADDC system has achievedsensy,specy,accuy,Fscore,and MCC of 97.34%,95.28%96.60%,97.34%,92.62%respectively.In the meantime,under 1000 epochs,the MDL-BADDC algorithm has obtainedsensy,specy,accuy,Fscore,and MCC 96.28%,92.45%,94.90%,96.02%,88.91%correspondingly.

Table 3:Result analysis of MDL-BADDC technique on Hungarian dataset

A brief result analysis of the MDL-BADDC method take place with recent algorithms in Tab.4.The figure outperformed that the WFR,C4.5,and FDT systems have obtained lesseraccuyof 56.93%,79.61%,and 77.53%correspondingly.Likewise,the HNNG and NN methods have resulted to moderateaccuyof 88.60%and 83.84%correspondingly.Besides,the ANN,SVM,and C4.5 techniques have obtained reasonableaccuyof 93.20% and 88.60% correspondingly.Lastly,the MDL-BADDC method has exhibited the existing methods with the maximal 95.51%.

Table 4:Accuracy analysis of MDL-BADDC technique on Hungarian dataset

3.3 Result Analysis on Z-Alizadeh Dataset

The confusion matrix existing by the MDL-BADDC system on the test Z-Alizadeh dataset is depicted in Fig.5.The figure stated that the MDL-BADDC approach has effectually identified the class labels under all epochs.For instance,with 200 epochs,the MDL-BADDC algorithm has identified 214 samples under Absent class and 86 samples under Present class.Furthermore,with 600 epochs,the MDL-BADDC system has identified 214 samples under Absent class and 84 samples under Present class.Moreover,with 1000 epochs,the MDL-BADDC method has identified 213 samples under Absent class and 85 samples under Present class.

Figure 5:Confusion matrix of MDL-BADDC technique on Z-Alizadeh dataset

A detailed outcome analysis of the MDL-BADDC technique on the test Z-Alizadeh dataset is depicted in Tab.5.The experimental outcomes stated that the MDL-BADDC system has accomplished effectual outcomes under every epoch.For instance,under 200 epochs,the MDL-BADDC algorithm has achievedsensy,specy,accuy,Fscore,and MCC of 99.07%,98.85%,99.01%,99.30%,97.59% respectively.Likewise,under 600 epochs,the MDL-BADDC technique has attainedsensy,specy,accuy,Fscore,and MCC of 99.07%,96.55%,98.35%,98.85%,95.96% respectively.In addition,under 1000 epochs,the MDL-BADDC approach has reachedsensy,specy,accuy,Fscore,and MCC 98.61%,97.70%,98.35%,98.84%,95.99%correspondingly.

Table 5:Result analysis of MDL-BADDC technique on Z-Alizadeh dataset

A comparative outcome analysis of the MDL-BADDC technique take place with recent methods in Tab.6.The figure portrayed that the NN Model,2 Hybrid Feature Selection (HFS),and SVC method have reached lesseraccuyof 86.72%,92.18%,and 91.95% respectively.Along with that,the HNNG and nu-SVM techniques have resulted in moderateaccuyof 94.75%and 93.34%respectively.Similarly,the ANN,SVM,and C4.5 techniques have attained reasonableaccuyof 97.32%and 96.60%correspondingly.Eventually,the MDL-BADDC approach has exhibited the existing algorithms with a higher 98.75%.

Table 6:Accuracy analysis of MDL-BADDC technique on Z-Alizadeh dataset

Fig.6 portrays the accuracy and loss analysis of the MDL-BADDC technique on three datasets.The results demonstrated that the MDL-BADDC system has accomplished improved performance with enhanced training and validation accuracy.It can be stated that the MDL-BADDC method has reached improved validation accuracy over the training accuracy.The figure demonstrates loss analysis of the MDL-BADDC technique on three datasets.The outcomes established that the MDLBADDC approach has resulted in a proficient outcome with the minimum training and validation loss.It can be obvious that the MDL-BADDC methodology has offered decreased validation loss over the training loss.

Figure 6:(Continued)

Figure 6:Accuracy and Loss graph analysis of MDL-BADDC technique on three datasets

4 Conclusion

In this study,a novel MDL-BADDC technique has been developed for atherosclerosis disease diagnosis and classification.The MDL-BADDC technique incorporates pre-processing,QOBMO based feature selection,DSAE based classification,and KHA based parameter tuning.The application of KHA helps to properly tune the parameters involved in the DSAE model and thereby enhances the detection outcomes.To showcase the enhanced classification performance of the MDLBADDC approach,a wide range of simulations take place on three benchmark biomedical datasets.The comparative result analysis reported the better performance of the MDL-BADDC technique over the compared methods.In future,the MDL-BADDC technique can be extended to other disease diagnoses such as lung cancer,brain tumor,etc.

Funding Statement:The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under Grant Number(RGP 2/279/43).Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R151),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.

Computers Materials&Continua2022年8期

Computers Materials&Continua的其它文章: EACR-LEACH:Energy-Aware Cluster-based Routing Protocol for WSN Based IoT; Medical Image Analysis Using Deep Learning and Distribution Pattern Matching Algorithm; Fuzzy MCDM Model for Selection of Infectious Waste Management Contractors; An Efficient Scheme for Data Pattern Matching in IoT Networks; Feedline Separation for Independent Control of Simultaneously Different Tx/Rx Radiation Patterns; Deep-piRNA:Bi-Layered Prediction Model for PIWI-Interacting RNA Using Discriminative Features