Deep Learning and Improved Particle Swarm Optimization Based Multimodal Brain Tumor Classifcation

2021-12-14 10:29AyeshaBinTahirMuhamamdAttiqueKhanMajedAlhaisoniJunaidAliKhanYunyoungNamShuiHuaWangandKashifJaved
Computers Materials&Continua 2021年7期

Ayesha Bin T.Tahir,Muhamamd Attique Khan,Majed Alhaisoni,Junaid Ali Khan,Yunyoung Nam,Shui-Hua Wang and Kashif Javed

1Department of Computer Science,HITEC University,Taxila,47040,Pakistan

2College of Computer Science and Engineering,University of Ha’il,Ha’il,Saudi Arabia

3Department of Computer Science and Engineering,Soonchunhyang University,Asan,Korea

4School of Architecture Building and Civil Engineering,Loughborough University,Loughborough,LE11 3TU,UK

5Department of Robotics,SMME NUST,Islamabad,Pakistan

Abstract: Background:A brain tumor refects abnormal cell growth.Challenges:Surgery, radiation therapy,and chemotherapy are used to treat brain tumors,but these procedures are painful and costly.Magnetic resonance imaging (MRI)is a non-invasive modality for diagnosing tumors, but scans must be interpretated by an expert radiologist.Methodology:We used deep learning and improved particle swarm optimization (IPSO) to automate brain tumor classifcation.MRI scan contrast is enhanced by ant colony optimization(ACO); the scans are then used to further train a pretrained deep learning model, via transfer learning (TL), and to extract features from two dense layers.We fused the features of both layers into a single, more informative vector.An IPSO algorithm selected the optimal features,which were classifed using a support vector machine.Results:We analyzed high- and low-grade glioma images from the BRATS 2018 dataset; the identifcation accuracies were 99.9% and 99.3%, respectively.Impact:The accuracy of our method is signifcantly higher than existing techniques;thus,it will help radiologists to make diagnoses,by providing a“second opinion.”

Keywords: Brain tumor; contrast enhancement; deep learning; feature selection; classifcation

1 Introduction

Brain tumors are the 10th most common type of cancer worldwide [1,2], and glioma is the most prevalent brain tumor.A low-grade glioma (LGG) can be cured if diagnosed early;high-grade gliomas (HGGs) are malignant.Generally, an LGG does not spread [3].The World Health Organization grades benign and malignant tumors as I, II and III, IV, respectively [4].Symptoms include diffculty speaking, short-term memory loss, frequent headaches, blurred vision,and seizures; these vary by tumor size and location.Magnetic resonance imaging (MRI) is used to visualize brain tumors.However, accurate classifcation is not possible with a single MRI sequence;multiple MRI sequences (T1, T1 with contrast enhancement, T2, and FLAIR [3] are required).In the United States alone, approximately 22,850 patients are diagnosed with brain tumors annually [5]; the number in 2019 was 23,890 (13,590 males and 10,300 females), including 18,020 deaths(10,190 males and 7,830 females).MRI is much more effcient than computed tomography; the amount of radiation is lower, while the contrast is higher.Analysis of MRI scans is diffcult [6];an automated approach is required [7].The typical analytical steps include preprocessing, feature extraction and reduction, and classifcation.Some researchers have used image segmentation for tumor detection, while others have focused on feature extraction for classifcation based on tumor intensity and shape [8,9].Features extraction is an essential step in disease classifcation [9].Based on the features, the tumor is identifed by feature properties including intensity, shape,etc.More recently, deep learning gives more impressive results for medical infection classifcation.Deep learning is invaluable for detecting and classifying tumors [10].There are several pretrained models [11] that classify extracted features using supervised learning algorithms such as Softmax,support vector machine (SVM), naïve Bayes, and K-nearest neighbor (KNN) [12].

In medical imaging, deep learning shows huge performance for both disease detections and classifcation.The major medical diseases are brain tumors [13], skin cancers [14], lung cancers [15], stomach conditions [16], retinal injuries [17], and blood diseases [18], among other conditions [19–21].Brain tumor analysis remains challenging [22]; several techniques are available but none of them are 100% accurate [23,24].Most techniques are based on machine learning [25],which facilitates early tumor detection [26].Convolutional neural networks (CNNs) [27], K-means algorithms [28], decision-level fusion [29], machine learning-based evaluation [30], and deep learning [31] approaches have all been used.Tanzila et al.[32] accurately detected tumors using feature fusion and deep learning.A grab-cut method was used for segmentation.The geometry of a transfer learning (TL) model was fne-tuned to identify features, and a serial-based method was used to fuse them.All features were optimized by entropy.The tumor detection accuracy was 98.78% for BRATS 2015, 99.63% for BRATS 2016, and 99.67% for BRATS.Schadeva et al.[33]improved segmentation and brain tumor classifcation accuracy using an active contour model that focused on the area of interest; features were extracted, reduced by principal component analysis,and classifed using an automated neural network.The classifcation accuracy was 91%.Mohsen et al.[34] used deep learning for brain tumor classifcation.MRI scans were segmented using the fuzzy c-means approach and discrete wavelet transformation was applied to extract features.A deep neural network performed the classifcation with an accuracy of 96.97%.The linear discriminant analysis (LDA) accuracy was 95.45% and that of sequential minimal optimization(SMO) was 93.94%.The deep learning network resembled a CNN, but required less hardware and was much faster.

Problem Statement:The major challenges in brain tumor classifcation are as follows:(i) manual evaluation is diffcult and time-consuming; (ii) tumor resolution is low and irrelevant features may be highlighted; (iii) redundant features cause classifcation errors; and; (iv) tumors grades I–IV look relatively similar.To resolve these issues, we present an automated classifcation method using deep learning and an improved particle swarm optimization (IPSO) algorithm.

Contributions:The major contributions of this study are as follows:(i) MRI scan contrast is improved using an evolutionary approach, i.e., ant colony optimization (ACO); (ii) a pretrained VGG-19 model is fne-tuned via TL; (iii) features are extracted from two different dense layers and fused into one matrix; and, (iv) the IPSO is combined with a bisection method for optimal feature selection.

The remainder of this manuscript is organized as follows.The ACO, improvement of the original image contrast, TL -based fne-tuning, serial feature fusion, and IPSO are discussed in Section 2, the HGG and LGG results are presented in Section 3, and the conclusions are provided in Section 4.

2 Proposed Methodology

We used deep learning for multimodal classifcation of brain tumors.The contrast of the original images was improved by ACO, and the images were used to train a CNN.TL of brain images was used to enhance a pretrained model.Features computed by different layers were aggregated, and the IPSO was used to select optimal features that were then classifed using a one-against-all multiclass SVM (MSVM) classifer.The overall architecture is shown in Fig.1.

Figure 1:Proposed architecture diagram of multimodal brain tumor classifcation using deep learning

2.1 Contrast Enhancement

Contrast enhancement is very important because unenhanced images exhibit low contrast,noise, and very poor illumination [29].Several enhancement techniques are available; we used an ACO-based approach.

Initial Ant Distribution—The number of ants is calculated as:

wherelis the length of the image,wis the width, andANis the number of ants randomly placed in the image (one pixel=one ant).

Decision-based on Probability—The probability that antnmoves from pixel (e, f) to pixel (g, h)is given by:

Whene,f∈Ω

Here, all pixel locations are writtene,f∈Ω.ρefis the pheromone level.ωef the visibility,and is calculated as follows:

The probability equation shows thatΔ-plus refects the stepwise directional fuctuation:

whereu(Δ) is the weight function.Together with the function above, the weight function ensures that sharp turns by ants are less frequent than gentle ones, which we refer to as “probabilistic forward bias.”

Rule of Transition—Mathematically, the rule of transition is expressed as:

whereijis the pixel location, from which ants can travel to pixel(k,l).If q>q°, an ant can visit the next pixel [see Eq.(2)].

Updating Pheromone Levels—An ant can move from pixelijto pixel(k,l), as stated above,and the pheromone trajectory is given by:

A new trajectory is obtained after each iteration, as follows:

whereΘ(0< Θ <1)is the proportion of pheromone that evaporates andρ°is the initial pheromone concentration [35].Applying the above steps to all image pixels yields an enhanced image (Fig.2).

Figure 2:Visual description of contrast stretching results on original images

2.2 Convolutional Neural Network

A CNN is a type of deep neural network that can be used for image recognition and classifcation, and object detection [36].A CNN requires minimal preprocessing.During training and testing, images pass through kernel layers, and are pooled and then fully connected; this is followed by Softmax classifcation.Probability values range from 0 to 1.Several pretrained CNN models are available, including VggNet and AlexNet [37].VggNet has valuable medical applications [38].We used a pretrained VGG-19 model [39] which includes 16 convolutional layers(local features), 3 fully connected layers, and max-pooling and ReLu layers (Fig.3).

Figure 3:VGG-19 architecture

2.3 VGG-19

VGG-19 containsNfully connected layers, whereN=1–3.ThePNunits of theNth layers areNRW=224,Nc=224 andNch=3.The dataset is represented byα, and the training sample byα.Eachis a real numberR:

where ω(1)is the frst weight matrix,r()is the Relu activation function,RWthe number of rows,cthe number of columns, andchthe number of channels.γ(1)is the bias vector andn(1)is the weight of the frst layer, defned as:

The output of the frst layer becomes the input of the second layer; this step is repeated as follows:

Here, by way of example, ω(2)and ω(3)are the second and third weight matrices, respectively.n(2)∊RN(2)×N(1)andn(2)∊RN(2)×N(1).ω(Z)represents the last fully connected layer used for highlevel feature extraction.Mathematically:

whereA(e)is the cross-entropy function,Bis the total number of classescl, andobandpthe predicted probabilities.

2.4 Transfer Learning

TL occurs when a system acquires knowledge and skills by solving a specifc problem, and then uses that knowledge to solve another problem [40].We used TL to further train, and improve the performance, of a pretrained model.The input wasand the original learning task can be described as:The target wasTo=a nd the new learning task can be written aswhere n«m andare the training data labels (Fig.4).

Figure 4:Transfer learning based retraining a model for multimodal brain tumor classifcation

Feature Extraction and Fusion:After TL, activation is required for feature extraction.We extracted features from FC layers 6 and 7.The feature vector of FC layer 6 had dimensions ofN×4,096, and that of FC layer 7 4,096.Mathematically, the vectors are expressed asandbothand∈R.We then fused the vectors into a single matrix to derive optimal tumor data.This can be done using serial, parallel, and correlational techniques.We used the lengths of extracted features and no features were discarded.Mathematically, the fused matrix can be expressed as:

whereis a fused matrix with dimensions ofk1×k2.Nis the number of images used for training and testing.k1 andk2 both have a value of 4,096.The fused vector includes a few irrelevant/redundant features, which were removed by IPSO.

2.5 Features Selection and Classifcation

It is important to select appropriate features for classifcation, because irrelevant features reduce classifcation accuracy and increase the computational time [41].However, it is not easy to identify the most important features because of their complex interactions.A good feature vector is required; in this study we used the IPSO algorithm.The original PSO [42] was a global search algorithm using evolutionary computation.PSO, as a population-based algorithm inspired by focks of birds and schools of fsh, is more effective than a general algorithm [43] in terms of convergence speed.Particles are initially placed randomly, and their velocities and positions are iteratively updated.The current and updated particle locations are referred to aspbestandgbest,respectively.The IPSO reduces the number of iterations required by including a “stop” condition based on a bisection method (BsM).The selected values are approximated and the algorithm is then terminated; the accuracy of each iteration is approximately the same as the previous one.Assuming that the position of thenth particle isYi=yi1,yi2,...,yiMand the velocity isVi=Vi1,Vi2,...,ViM, the local best particle isLi=li1,li2,...,linand the global best particle isGb=gb1,gb2,...,gbM.The updated position of theith particle is calculated as:

where −=1,2,3,...,N,m=1,2,3,...M,Sis the number of iterations,Nis the size of the swarm,R1 andR2 are random numbers [0, 1],a1 anda2 are acceleration coeffcients, andxis the inertial weight.A linear value ofxthat varies with time is calculated as:

Here,Tis the maximum iteration time,xmaxis the upper limit, andxminis the lower limit.During feature selection, every solution is a subset of features.Each set of particles is denoted as a binary vector, and every particle has a specifc position.TheMth feature is defned by theMth position.Features are selected by the IPSO, which begins with a random solution and then moves toward the best global solution (represented by a new subset of features).Each feature is linked to a dataset that occupies a search space.If theMth position is 1, theMth feature is considered informative, while if theMth position is 0, theMth feature is not informative.If theMthposition is −1, theMth feature is not added to the set.

Fitness Function:Each solution yielded by the selection algorithm was tested in terms of ftness within every generation.If accuracy improved, the current solution was the best one.The solution with maximum ftness is the best one overall.We used the fne KNN classifer and BsM.The starting accuracy was 90.0 (˜t), and the fnal accuracy is expressed ast.The midpoint of ˜tandtwas computed and the root was found.If the root was equal to zero, the algorithm terminated;otherwise, the next iteration started and the root betweentandt+1 was found.If the interval was not zero, the midpoint oftandt+1 was determined, and the following criteria were checked:

Thus, the values were updated until two successive iterations became very similar.We initially selected 100 iterations, but the algorithm stopped between 10 and 20 iterations, yielding aN×1,875 vector containing approximately 40% of all features that were fnally classifed using the one-against-all SVM.

Consider anN-class problem withBtraining samples,(s1,t1),...,(sn,tn), wheresi∊Rais an n-dimensional feature vector andti∊{1,2,...,N}.The method buildsNbinary SVM classifers,and each classifer separates all classes.Training of thei-th SVM uses all samples withi−thpositive labels and the remaining negative labelsdi(S)=

t,j=1 iftj=i, andt,j=−1 otherwise.

Samplesis classifed into the classi∗, thed∗of which is the highest during classifcation:

3 Experimental Results and Comparison

We analyzed the BRATS 2018 dataset [44], which contains HGG and LGG data.In total,70% of the data were used for training and 30% for testing (Fig.5).We evaluated multiple classifers in terms of accuracy, sensitivity, precision, the F1-score, the area under the curve (AUC)the false-positive rate (FPR), and computational time.All simulations were run on Matlab 2019a(MathWorks, Natick, MA, USA) using a Core i7 processor, 16 GB of RAM, and an 8 GB graphics card.

Figure 5:Proposed testing process

3.1 Testing Results of HGG Images Data

We frst classifed HGG images (30% of all test images).The results obtained via fusion of the original feature vectors are shown in Tab.1.The highest accuracy was 99.9%, for the MSVM, with a sensitivity of 99.25%, precision of 99.50%, F1-score of 99.3%, FPR of 0.00, and AUC of 1.00.The other accuracies were as follows:fne tree, 89.20%; linear SVM, 98.70%; coarse Gaussian,95.80%; fne KNN, 99.70%; medium KNN, 97.70%; cubic KNN, 97.0%; weighted KNN, 99.20%;ensemble-boosted tree, 96.40%; and ensemble-bagged tree, 98.0%.Thus, the MSVM performed best.The confusion matrix is shown in Fig.6; the accuracy rate always exceeded 99%.The computational times are listed in Tab.1.The medium KNN had the shortest computational time,at 28.52 s but the accuracy was only 97.75%.The receiver operator characteristic (ROC) curves are shown in Fig.7.

Table 1:Classifcation results for the proposed method using original fused feature vectors

Figure 6:Confusion matrix for MSVM using original fused feature vectors

Figure 7:ROC plots of MSVM using original fused feature vectors

The optimized HGG features are listed in Tab.2 (HGG).The highest accuracy was 99.9%,for the MVSM, followed by 85.20% for the fne tree classifer, 98.75% for the linear SVM, 95.50%for the course Gaussian, 99.60% for the fne KNN, 97.30% for the medium KNN, 97.50% for the cubic KNN, 99.20% for the weighted KNN, 93.30% for the ensemble-boosted tree, and 97.60% for the ensemble-bagged tree.Thus, the MSVM showed the best performance; the confusion matrix is shown in Fig.8.The computational times are listed in Tab.2.The coarse Gaussian SVM had the shortest computational time (6.17 s), but the accuracy was only 95.70%, i.e., lower than that of the MSVM.The ROC curves are shown in Fig.9.

Table 2:Classifcation results after employing proposed optimal features

Figure 8:Confusion matric of MSVM after employing optimal feature selection

Figure 9:ROC plots of MSVM for the verifcation of AUC

3.2 Testing Results of LGG Images Data

The original feature vectors for the LGG images were fused (Tab.3).The highest accuracy(99.1%) was achieved by the MSVM, with a sensitivity of 99.00%, precision of 99.00%, F1-score of 99.00%, FPR of 0.002, and AUC of 1.00.The other accuracies were as follows:fne tree,78.30%; SVM, 93.40%; coarse Gaussian, 82.60%; fne KNN, 98.00%; medium KNN, 91.90%;cubic KNN, 91.90%; weighted KNN, 96.50%; ensemble-boosted tree, 87.10%; and ensemblebagged tree, 94.10%.In the confusion matrix shown in Fig.10; the accuracy rate always exceeded 99%.The computational times are listed in Tab.3 (last column).The fne KNN had the shortest computational time (27.56 s), but the accuracy was only 98.00%, i.e., less than that of the MSVM.The longest computational time was 356.66 s.The MSVM ROC curves are provided in Fig.11.

Figure 10:Confusion matric of MSVM using fused feature vector

Figure 11:ROC plots of MSVM using fused feature vector

The optimized LGG features are listed in Tab.4.The MSVM showed the best classifcation performance, with an accuracy of 99.3%, sensitivity of 99.25%, precision of 99.25%, F1-score of 99.25%, FPR of 0.000, and AUC of 1.00.The computational time required was 11.92 s;however, the best time was in fact 6.25 s.The other accuracies were as follows:fne tree, 78.00%;linear SVM, 93.30%; coarse Gaussian, 85.40%; fne KNN, 98.20%; medium KNN, 93.30%; cubic KNN, 93.20%; weighted KNN, 97.30%; ensemble-boosted tree, 83.90%; and ensemble-bagged tree,93.90%.The confusion matrix is illustrated in Fig.12; the accuracy rate always exceeded 99%.The MSVM ROC curves are shown in Fig.13.The use of optimal selected features improved classifcation accuracy and signifcantly reduced computational times.

Table 3:Classifcation results of LGG by employing a fused feature vector

Table 4:Classifcation results of the proposed method of LGG data after optimal feature selection

Figure 12:Confusion matrix of MSVM on LGG data after employing optimal features

Figure 13:Confusion matrix of MSVM on LGG data after employing optimal features

3.3 Comparison with Existing Techniques

Comparison with the existing techniques is also conducted to validate the proposed method(can be seen in Tab.5).This table shows that the best accuracy previously achieved on the Brats2018 dataset was 98% [44].In that approach, the authors used the LSTM approach.Amin et al.[45] achieved the second-best accuracy of 93.85%.In more recent work, Khan et al.[46]achieved an accuracy of 92.5% using a deep learning framework.Our proposed method is also deep learning-based.We have tested on both HGG and LGG brain images and achieved an accuracy of 99.9% and 99.3%, respectively.The main strength of this work is the selection of the optimal features using an improved PSO algorithm.Moreover, the proposed labeled results are also given in Fig.14.

Table 5:Comparison of the proposed method results with existing techniques for the BRATS2018 dataset

Figure 14:Prediction results of the proposed method in the form of corresponding labels

4 Conclusion

A new automated technique is proposed in this article for brain tumor classifcation using deep learning and the IPSO algorithm.The contrast of original MRI scans is enhanced using the ACO approach to learn a better CNN model.This step not only enhances the tumor region but also extracts more relevant features.Later, fusion of two-layer features improves the original accuracy of classifcation.A few redundant features are also added in the fusion process for classifcation, which does not yield the target accuracy.Therefore, another algorithm called the IPSO is proposed to improve the system’s accuracy and minimize computational time.Hence,we conclude that the most optimum features give better classifcation accuracy and decrease the system prediction time.The major limitation of this work is the proposed stopping criterion.There is a chance that the features after the stopping condition may perform well.In future, we aim to try to enhance this stopping criterion and will perform experiments on the BraTs2019 dataset as well.

Funding Statement:This research was supported by Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0012724, The Competency Development Program for Industry Specialist) and the Soonchunhyang University Research Fund.

Conficts of Interest:The authors declare that they have no conficts of interest to report regarding the present study.