Classifcation of COVID-19 CT Scans via Extreme Learning Machine

2021-12-14 10:29MuhammadAttiqueKhanAbdulMajidTallhaAkramNazarHussainYunyoungNamSeifedineKadryShuiHuaWangandMajedAlhaisoni
Computers Materials&Continua 2021年7期

Muhammad Attique Khan,Abdul Majid,Tallha Akram,Nazar Hussain,Yunyoung Nam,Seifedine Kadry,Shui-Hua Wang and Majed Alhaisoni

1Department of Computer Science,HITEC University,Taxila,47040,Pakistan

2Department of Electrical Engineering,COMSATS University Islamabad,Wah Campus,Wah Cantt,Pakistan

3Department of Computer Science and Engineering,Soonchunhyang University,Asan,Korea

4Department of Mathematics and Computer Science,Faculty of Science,Beirut Arab University,Lebanon

5Department of Mathematics,University of Leicester,Leicester,UK

6College of Computer Science and Engineering,University of Ha’il,Ha’il,Saudi Arabia

Abstract: Here, we use multi-type feature fusion and selection to predict COVID-19 infections on chest computed tomography(CT)scans.The scheme operates in four steps.Initially,we prepared a database containing COVID-19 pneumonia and normal CT scans.These images were retrieved from the Radiopaedia COVID-19 website.The images were divided into training and test sets in a ratio of 70:30.Then, multiple features were extracted from the training data.We used canonical correlation analysis to fuse the features into single vectors; this enhanced the predictive capacity.We next implemented a genetic algorithm (GA) in which an Extreme Learning Machine (ELM)served to assess GA ftness.Based on the ELM losses,the most discriminatory features were selected and saved as an ELM Model.Test images were sent to the model, and the best-selected features compared to those of the trained model to allow fnal predictions.Validation employed the collected chest CT scans.The best predictive accuracy of the ELM classifer was 93.9%; the scheme was effective.

Keywords: Coronavirus; classical features; feature fusion; feature optimization; prediction

1 Introduction

The novel coronavirus pandemic disease that appeared in China has rapidly spread worldwide [1].The World Health Organization (WHO) termed the disease caused by the virus COVID-19 on February 1, 2020 [2].COVID-19 spread from Wuhan, China, to become a major global health problem [3].The WHO has recorded 86,806 confrmed cases of COVID-19 in China,and 4634 deaths to date (16th December 2020) [4].Many members of the coronavirus family cause disease.The virus that causes COVID-19 is termed the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [5].The common symptoms of COVID-19 are fever and cough, and sometimes headache and fatigue [6].The COVID-19 virus was frst discovered in humans in 2019 and spread rapidly in respiratory droplets [7].

To date (19th December 2020), there are 76,186,444 confrmed COVID-19 cases worldwide,with 1,684,864 deaths, according to WHO.It shows that the global mortality rate is 6.9%.The USA’s confrmed COVID-19 cases are 17,899,267 and 321,025 deaths, which is top of the list in the world.Confrmed COVID-19 cases in India are 10,013,478, Brazil are 7,163,912, Russia are 2,819,429, France are 2,442,990, Turkey are 1,982,090, and UK are 1,977,167, respectively.In these countries, the number of deaths are 145,298, 185,687, 50,347, 60,229, 17,610, and 66,541,respectively to date (19th December 2020).And these cases are increasing day by day.Italy is another highly affecting country by this virus, and the positive reported cases are 1,921,778, and total deaths are 67,894.The Asian countries such as India is on the top, which is highly affected by this virus.In Pakistan, this rate is much slower as compared to other Asian Countries.

COVID-19 poses a major healthcare problem worldwide [8,9].Early detection of infected patients would be helpful [10].Many computer-aided diagnostic (CAD) systems allow physicians to recognize the stomach [11], lungs [12], and brain cancer [13], and COVID [14] infections.The existing COVID detection methods are slow and expensive.It is possible to use machinelearning algorithms to identify diseased lungs on computed tomography (CT) images.Most such algorithms feature supervised machine-learning.CAD systems automatically extract textural [15], geometric [16], and deep [17] features of chest CT images.However, not all extracted features are useful; higher-dimensionality data render diagnoses slow and compromise the results.Many medical researchers seek to develop feature selection methods [18] that transform highdimensional features into simpler components.The algorithms select robust features and remove irrelevant information [19].A robust feature set enhances CAD system performance in terms of both accuracy and time.The many methods of optimal feature selection include entropybased approaches [20], Particle Swarm Optimization (PSO), and the so-called Grasshopper, and genetic algorithms (GAs) [21], to name a few [22].In this work, we presented an automated technique for COVID19 classifcation using CT images.Our major contributions in this work are in the following steps:(i) Multi properties features are extracted in different directions from the CT images; (ii) Employed a parallel canonical correlation analysis approach for features fusion; (iii) GA is implemented for the best features selection, and (iv) ELM based classify the selected features.

The rest of the manuscript is organized as follows:The existing relevant studies are discussed in Section 2 (related work).The proposed methodology is described in Section 3, which includes dataset preparation, features fusion, and selection.Section 4 represents the experimental results,and fnally, analysis and conclusion are presented in Section 5.

2 Related Work

Recently, COVID-19 patients have been diagnosed by reference to their X-ray and CT images using computer vision (CV)-based machine-learning algorithms, principally, supervised learning and deep learning (DL) techniques.Apostolopoulos et al.[23] subjected X-ray image datasets to transfer learning using different convolutional neural network (CNN) models and evaluated their technique with the aid of two datasets that contained confrmed COVID-19, bacterial pneumonia,and normal cases.The maximum accuracies achieved by MobileNet V2 [24] were 96.78% (twoclass) and 94.72% (three-class).Li et al.[25] developed a novel DL model (COVNet) to detect COVID-19 in chest CT images; the sensitivity was 90% and the specifcity 96%.Shan et al.[26]implemented a neural network-based VB-Net model for segmentation of COVID-19-infected regions in CT images.The Dice similarity was 91.6 ± 10.0%.Tsiknakis et al.[27] explored the uncertainties of DL models used to detect COVID-19 in chest X-ray images.Uncertainty was estimated via transfer learning using a Bayesian DL classifer.Narin et al.[28] used three CNN models (ResNet50, Inception-ResNetV2, and InceptionV3) to identify COVID-19 patients employing chest X-ray images.ResNet50 performed best (98% classifcation accuracy; the fgures for InceptionV3 and Inception-ResNetV2 were 97% and 87% respectively).In [29], a DL based COVIDX-Net was used for automatic detection of COVID-19 in X-ray images.The model employed seven CNN models including VGG19, DenseNet201, InceptionV3, ResNetV2,InceptionResNetV2, Xception, and MobileNetV2.The F1 scores of VGG19 and DenseNet201 were 0.91.A segmentation technique [30] has been used to identify lung regions infected by COVID-19.During pre-processing, CT images were enhanced using the Firefy Algorithm (FA)applying the guided Shannon Entropy (SE) thresholding method.The enhanced images were segmented employing the Markov Random Field (MRF) method.The segmentation accuracy exceeded 92%.Wang et al.[31] proposed a DL-based method to predict COVID-19 disease in CT images.The cited authors fne-tuned a modifed Inception architecture and extracted CNN features for classifcation.The accuracy was 82.9% and the area under the curve (AUC) 0.90.Jaiswal et al.[32] extracted various textural features including the Local Directional Pattern (LDP),the Grey Level Co-occurrence Matrix (GLCM), the Gray Level Run Length Matrix (GLRLM),the Discrete Wavelet Transform (DWT), and the Grey Level Size Zone Matrix (GLSZM).The GLSZM afforded the highest accuracy of 99.68% when a support vector machine (SVM) classifer was employed.All of hybrid feature selection [33], a fuzzy approach with DL [34], a transfer learning-based method [35], and other approaches [36], have been described.In summary, these methods are tried to improve the classifcation accuracy but the main issue is availability of dataset.These methods are trained through classical techniques because of low dimensional datasets.In this work, we employed 58 patients’ data and train a deep learning model instead of classical techniques.

3 Proposed Methodology

We develop automated prediction of positive COVID-19 pneumonia cases using CT scans.The positive cases are labeled via RT-PCR testing.The scheme features four steps.First, a database was prepared by collecting COVID-19 pneumonia-positive chest CT scans and normal scans from the Radiopaedia COVID-19 website (https://radiopaedia.org/cases).Next, the images were divided into training and test sets.Several features (the Dynamic Average LBP, the SFTA,the DWT, and the Reyni Entropy) were extracted from the training data.We used canonical correlation analysis (CCA) to fuse the features into single vectors; this enhanced the predictive power.We next implemented a GA using the fused feature vectors.The ELM classifer served as the GA ftness function.We used ELM loss to select the most discriminatory features and saved them as an ELM Model.Next, the test images were evaluated and the best-selected features compared to those of the trained model in terms of fnal predictions.A fow chart is shown in Fig.1; 60% of the images were used to train the ELM Model and the remainder for testing.

Figure 1:A proposed architecture for prediction of COVID-19 positive chest CT scans using improved genetic algorithm based features selection

3.1 Database Assembly

All images (58 patients) were collected from the Radiopaedia COVID-19 website.We downloaded data on 30 patients with COVID-19 pneumonia confrmed via RT-PCR.We gathered 3,000 COVID pneumonia-positive images and 2,500 control images (Fig.2).All images were resized to 512×512 pixels.Case 2 was contributed by Dr.Chung et al.(https://radiopaedia.org/cases/covid-19-pneumonia-2), Case 3 by Dr.Bahman (https://radiopaedia.org/cases/covid-19-pneumonia-3), Case 4 by Dr.Bahman (https://radiopaedia.org/cases/covid-19-pneumonia-4),Case 7 by Dr.Domineco (https://radiopaedia.org/cases/covid-19-pneumonia-7), and Case 12 by Dr.Fabio (https://radiopaedia.org/cases/covid-19-pneumonia-12), among others.

Figure 2:Sample images of COVID-19 positive and normal chest CT scans

3.2 Extraction of Classical Features

In terms of pattern recognition and machine-learning, features play pivotal roles in object representation.Many feature selection techniques for medical images are available [37].However,some methods are better than others.Classical features are favored when data are lacking.We could not train an advanced machine-learning model such as a CNN; therefore, we extracted three features (the Dynamic Average LBP, the SFTA, and the DWT), and Entropy, as described below.

3.2.1 Dynamic Average LBP

Dynamic Average LBP (DALBP) features are modifcations of the original LBP features; averages are used rather than central values.As in the original LBP [38], the principal value is replaced by neighboring values based on a threshold but, in the improved version, an average value is computed based on the dynamic window size.First, we examined the original LBP features.Assume thatξ(x,y)is an original chest CT image of dimensions 512×512, andξ(x,y)∈Rd.Consider theξ={ξ1,ξ2,ξ3,...,ξN}∈R input images and their labelsl={l1,l2,...,lN}.For imageξ(x,y), the LBP binary code is defned as follows:

wherepcdenotes the central pixel value,pnthe neighbor pixel values ofpc,rthe radius, andXthe total number of neighbors.Suppose the central coordinates arepc=(0,0); those ofpnare then:

As the image dimensions are 512×512, an LBP code histogram is computed as follows:

wherek∈[1,2,3,...,K] represents the maximal LBP code value.As for the DALBP, we employ the averages of two 3×3 windows, not the central pixel value.For each 3×3 block, the average value is replaced by the neighboring pixels.

Consider an image of dimensions 512×512 and divide it into 18 overlapping blocks [39] that are each then further divided into two equal blocks (9-bit vectors in each block).Next, compute the averages and extract the binary bits as follows:

whereφis the “Signum” function and(x1,t)are the parameters ofφ.This calculates the binary features.From each block, 8 bits are computed because execution runs from 1 to 9, where 9 denotes the bit vectors of a block.Later, these binary bits (both blocks) are converted into decimals, as follows:

where B denotes the bit vector length (8 in this work),b1 denotes block 1,b2 denotes block 2, theDb1(i)are the decimal values of the block 1 bits, and theDb3(i)the decimal values of the block 2 bits.Finally, a histogram is built employingDb1(i)andDb2(i)with the aid of Eqs.(4) and (5).The length of each histogram is 256.Then, the values of both histograms are concatenated into single vectors to yield the fnal DALBP features:

where the length of each fnal feature vector [H(h1,h2)] isN×512 (Nis the number of training images used for feature extraction).

3.2.2 SFTA Features

SFTA features, also termed textural features, are also used to extract discriminatory information.In the medical context, SFTA features are often used to describe organs.The principal textural descriptor is the GLCM, but the use thereof is very time-consuming.We used accurate SFTA features that can be rapidly extracted [40].An image is initially decomposed into a binary image via two-threshold segmentation:

wheredenotes the lower bound of the threshold,the upper bound, andthe binary image.The SFTA feature vector is constructed in fractal dimensions as follows:

whereΨ8denotes the set of pixels that are 8-connected to(u,v).Hence, after binary boxcounting, we obtain a textural feature vector of dimensions N×21.

3.2.3 DWT and Entropy

Discrete Wavelet Transform (DWT) is a well-known type of feature extraction that analyzes images at various scales and resolutions [41].Letg(x)be a continuous square-integral function.Mathematically, this is:

s1∈R+ve,s2∈R

wheres1ands2are the scale and translation parameters of a real-valuedφ.A discrete variation of Eq.(13) can be ensured by limitings1ands2to a discrete lattice withs1=2jands2=2jk.Mathematically, this is:

The Renyi entropy feature vector is computed in row coeffcients; the output vector is of dimensionsN×512.Entropy replaces the zero and negative features with positive values.

3.3 Feature Fusion

All extracted features were fused employing canonical correlation analysis (CCA) [42].The purpose was to enhance the learning capacity of each feature vector in terms of correct predictions of COVID-19-positive chest scans and normal scans.Suppose thatΥ1∈Rr×n,Υ2∈Rq×n, andΥ3∈Rs×nare three feature vectors extracted using the DALBP, SFTA, and DWT-plus-Entropy approaches, respectively.The dimensions areN×512,N×21, andN×512, respectively.LetΔxx∈Rr×n,Δyy∈Rp×n, andΔzz∈Rs×nrepresent the covariance matrices ofΥ1,Υ2, andΥ3respectively.Also, letΔxy∈Rr×q,Δyz∈Rp×s, andΔxz∈Rr×srepresent the between-sets covariance matrices.We considerΥ1andΥ3when computing the between-set covariance:

Next, Lagrange multipliers are used to solve the maximization problem betweenandthe following condition is satisfed:

Finally, the transformed features are combined as follows:

whereΦdenotes the CCA fused vector.The process is repeated forΥ2.Finally, we obtain a fused CCA vector of dimensionsN×776 (in this work; the dimensions vary by the dataset).

3.4 Feature Selection

Feature selection involves selection of the best subset of input feature vectors based on a defned criterion.Use of a best subset improves learning and predictive accuracy, and reduces the computational time.We implemented a GA and an ELM ftness function.Most researchers use the Fine KNN and SVM for ftness calculations; however, we believe that the ELM is more effcient.The GA is Algorithm 1.The initial population size is 100, the number of iterations 1,000,the crossover rate (ψcr) 0.4, the mutation rate(ψmr)0.01, and the selection pressure (β) 5.In Step 2, ELM-based ftness is calculated for K-Fold = 5 andEloss= |Network Output−Original Label|.In Step 3, the selection is performed via a Roulette Wheel that follows the crossover and mutational steps.Step 2 is repeated until the desired accuracy is attained.The fnal robust vector is denotedΦfs(i)of dimensionsN×K(here, K=426, this the length of the fnal feature vector).

Algorithm 1 Output: Φfs(i)←Robust Vector Input: ξfd(i)←Fused Vector Step 1:Parameters Initialization—Population ←N=100—Iterations ←T=1000—ψcr ←0.4—ψmr ←0.01—β ←5 Start Step 2:Fitness Function—Extreme Learning Machine—K-fold ←5—Eloss ←|Network Output −Original Label|Step 3:Selection—S ← siimages/BZ_985_474_1308_505_1353.pngΣ(si)si ←exp −p1×Xβ SLimages/BZ_985_700_1308_731_1353.pngStep 4:Crossover (Uniform Crossover)—ψcr ←CrossOver(L1,L2)Step 5:Mutation—Type ←Uniform Step 6:Repeat Step 2 Step 7: Φfs(i)←Best Features End

3.5 The Extreme Learning Machine(ELM)

Given a robust feature vector after GA and appropriate labelingi=1,2,3,...,N} (whereΦidenotes the selected features andlithe target labels), the input features (weights) and target outputs that minimize the error [43] are defned as:

whereHldenotes the hidden-layer, output weight matrix;the target output;the Frobenius Norm;Lthe hidden nodes;βthe weight matrix;O(.)the activation function; andwj,bjthe weight and bias matrices for thejth node.The output weight matrixβis solved as follows:

where C is the the tradeoff between the training error and the norm of the output weight.The error betweenand a target labellis:

Based on this error,l=1 andl=−1 are the outputs.l=1 indicates COVID-19 pneumonia andl=−1 a healthy lung.The model is trained using 60% of the data and the test images then passed to the scheme to select the best features, The chosen features are compared to those of the trained ELM classifer and the predictive outputs are both labels and numerical values.A few labeled images are shown in Fig.3; these are predictions made during ELM testing.

Figure 3:Proposed prediction results of ELM during the testing step

4 Experimental Setup and Results

We used publicly available chest CT images (Section 3.1).We extracted DALBP, SFTA, and DWT-plus-Entropy features, fused them using the CCA approach, employed a GA to select robust features, and delivered these to the ELM classifer for fnal predictions.We compared ELM performance to those of Logistic Regression, Q-SVM, Fine Gaussian, Fine Tree, and Cubic KNN in terms of accuracy, Precision, Recall, Specifcity, and the F1 score.We used an Intel Core i7 8th generation CPU equipped with 16 GB of RAM and 8 GB of GPU running MATLAB 2019b software.We took a 60–40 approach with 10-fold cross-validation during both training and testing.

4.1 Results and Discussion

The predictions are presented in numerical form and as bar graphs.We explored the accuracies afforded by DALBP, SFTA, and DWT-plus-Entropy features; CCA fusion accuracy; and GA accuracy.For the DALBP features (Tab.1) the highest accuracy was 84.52% using the Quadratic SVM classifer; the fgure for the ELM classifer was 82.30%.The linear and Naïve Bayes accuracies were 82.42 and 81.49%.The accuracies were 80.20%, 81.21%, 80.14%, 81.63%,79.38%, 81.52%, 78.12%, 79.68%, 80.72% and 76.71% respectively.For the SFTA textural features(Tab.1), the accuracies were lower than the DALBP fgures.The highest accuracy was 78.95%,and the accuracies afforded by the other listed classifers were 75.27%, 73.78%, 71.62%, 71.96%,77.52%, 75.06%, 72.94%, 77.52%, 74.22%, 74.36%, 79.42%, 73.21%, and 76.13% respectively.For the DWT-plus-Entropy features, the maximum accuracy was 82.60% (Naïve Bayes), higher than any SFTA fgure.Thus, the DALBP features were optimal.

Table 1:Prediction accuracy of COVID-19 pneumonia and normal cases using separate features without fusion and selection

The CC-based fusion approach was then employed for prediction.The ELM classifer was best:92.8% accuracy, 93.81% precision, 94% specifcity, and an F1 score of 0.93.The worst classifer was the EBT (86.1% accuracy).All results are shown in Tab.2.The classifer accuracies were 92.7%, 92.6%, 92.2%, 92.2%, 91.5%, 90.8%, 90.6%, 90.3%, 90.2%, 90.2%, 89.8%, 89.7%, and 86.1%.Thus, CCA-based fusion improved accuracy by about 10%.The fusion accuracies of ELM and LSVM are shown in Fig.4 (a confusion matrix).

Table 2:Proposed prediction results of multi-type features fusion

Figure 4:Confusion matrix of LSVM and ELM after multi-type features fusion

We used an improved GA to select the best features for fnal prediction (Section 3.4 and Algorithm 1).The predictive performances of several classifers are shown in Tab.3.All improved after implementation of feature selection.The ELM classifer was robust (accuracy 93.9%, precision 93.14%, specifcity 95%, recall 94%, and F1 Score 0.94) The next best classifer was the linear SVM (LSVM) (93.4% accurate).Fig.5 shows the ELM/Linear SVM confusion matrix;feature fusion and selection improved performance.Fig.6 shows the receiver-operator curves for prediction of healthy and COVID-19 pneumonia CT images; the AUCs were verifed.In summary,selection of the best features afford excellent prediction of COVID-positive and normal chest CT scans.

Table 3:Proposed prediction results after best features selection using improved genetic algorithm and ELM

Figure 5:Confusion matrix of LSVM and ELM after best features selection using improved genetic algorithm

Figure 6:Representation of ELM performance based on ROC plots

5 Analysis and Conclusion

Tab.1 shows the predictive accuracies of various features prior to fusion and selection.The highest accuracies were 84.52, 78.95, and 82.60 for the DALBP, SFTA, and DWT-plus-Entropy features.After CCA-mediated fusion, the fgures rose by 10%.The highest accuracy after fusion was 92.5% (Tab.2).Accuracy was further improved by selection (Tab.3 and Figs.5 and 6).We analyzed the utility of selection by calculating standard errors of the mean (SEMs) (Tab.4) for the ELM and the other three top classifers.The minimum accuracy of ELM after 100 iterations was 92.76% and the highest accuracy 93.90%; the SEM was 0.4040.Thus, a minor change occurred after execution.The error bars (Fig.7) are based on the SEMs and confdence levels; the scheme is scalable.

Table 4:Analysis of proposed selection features based prediction accuracy on ELM and three other top classifers

It is clear that fusion of multi-type features is valuable.This increases the number of predictors and enhances predictive accuracy.However, a few irrelevant features were added; if these are removed, accuracy is not compromised.Removal was effected via feature selection.We used an improved GA and an ELM to select the best features and improve predictive accuracy.In future work, we will seek a more effcient feature selection algorithm to improve accuracy further.Moreover, we will seek to build a larger image dataset that we will use to train a CNN.

Figure 7:Confdence interval of ELM after best features selection using improved GA

Funding Statement:This research was supported by Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE) (P0012724, The Competency Development Program for Industry Specialist) and the Soonchunhyang University Research Fund.

Conficts of Interest:The authors declare that they have no conficts of interest to report regarding the present study.