A Secure Intrusion Detection System in Cyberphysical Systems Using a Parameter-Tuned Deep-Stacked Autoencoder

2021-12-14 06:06NojoodAljehane
Computers Materials&Continua 2021年9期

Nojood O.Aljehane

College of Computer Science and Information Technology,University of Tabuk,Tabuk,Saudi Arabia

Abstract:Cyber physical systems (CPSs) are a networked system of cyber(computation,communication)and physical(sensors,actuators)elements that interact in a feedback loop with the assistance of human interference.Generally,CPSs authorize critical infrastructures and are considered to be important in the daily lives of humans because they form the basis of future smart devices.Increased utilization of CPSs,however,poses many threats,which may be of major significance for users.Such security issues in CPSs represent a global issue;therefore,developing a robust,secure,and effective CPS is currently a hot research topic.To resolve this issue,an intrusion detection system(IDS)can be designed to protect CPSs.When the IDS detects an anomaly,it instantly takes the necessary actions to avoid harming the system.In this study,we introduce a new parameter-tuned deep-stacked autoencoder based on deep learning(DL),called PT-DSAE,for the IDS in CPSs.The proposed model involves preprocessing,feature extraction,parameter tuning,and classification.First,data preprocessing takes place to eliminate the noise present in the data.Next,a DL-based DSAE model is applied to detect anomalies in the CPS.In addition,hyperparameter tuning of the DSAE takes place using a search-and-rescue optimization algorithm to tune the parameters of the DSAE,such as the number of hidden layers,batch size,epoch count,and learning rate.To assess the experimental outcomes of the PT-DSAE model,a series of experiments were performed using data from a sensor-based CPS.Moreover,a detailed comparative analysis was performed to ensure the effective detection outcome of the PT-DSAE technique.The experimental results obtained verified the superior performance on the applied data over the compared methods.

Keywords:Cyberphysical system;intrusion detection system;autoencoder;cybersecurity

1 Introduction

In general,sensors are embedded in cyberphysical systems (CPSs) to monitor anomalies and manage intrusions and hazards.To predict and prevent abnormalities,anomaly detection systems(ADSs) have been applied.Therefore,ADSs experience false positives (FPs,false alarms) and false negatives (FNs,missed predictions),which result in performance limitations in CPS domains.In particular,FPs tend to recover unwanted information,whereas FNs tend to recover essential data only.These prediction errors result in imbalanced values,which are forwarded to the controller and result in nonoptimal destabilizing control solutions that compromise the system performance.For instance,prediction errors result in catastrophic actions,such as reactor dispersion in process control systems,pollution in water distribution systems,and traffic control in smart transportation networks [1].

CPS networks are comprised of sensors,actuators,and networking modules,which are suitable in the fields of power,automation,development,civil structure,and medicine,among others.Generally,a CPS is a difficult system in which external operations and cyber applications are supported in a combined fashion.Although information and communications technology (ICT)is extremely progressed in CPSs,cybersecurity is still considered a vital issue in several sectors.One of the complicated vulnerabilities in CPSs is intrusion hazards.In the past few decades,close attention has been paid to the enhancement of CPS security.Intrusion detection (ID) is one of the important applications for maximizing the integrity of CPSs.Intrusion detection systems (IDSs)are usually applied to effectively prevent attacks.In 1980,Anderson presented the notion of basic ID,which was later followed by a massive number of studies on IDSs.

In general,IDS approaches are categorized into two major classes:misuse and anomaly prediction.Initially,features of well-known attacks are applied for misuse prediction.At this point,the audited data are related to a database and reported as an intrusion.Although misuse detectors generate the minimum FPs,these detectors have massive limitations.For example,with these detectors,developing and maximizing a comprehensive database represent a tedious operation,and well-known attacks are expected.Many models have been developed for misuse prediction.For example,Abbes et al.established a new protocol analysis to enhance the performance of pattern matching.In [2],the authors estimated ID-based pattern matching.A rule-based expert method has been used for misuse prediction.Moreover,a genetic algorithm (GA) has been employed for computing misuse detection.Recently,data mining (DM) schemes have been used to develop misuse prediction approaches.An extensive review of ID by GAs and DM is available in [3].However,only a few efforts were made to classify and predict system intrusions under the application of colored Petri nets.Anomaly detectors shape the general behavior of a network.An intrusion is defined as considerable degradation from general system operation.One of the major benefits of these detectors is their ability to identify attacks,which is traditionally unknown.Unlike classical models,this model yields FPs,although its accuracy is low.

Some prediction approaches depend on clustering models.Recently,several artificial learning methods have been extensively applied in anomaly prediction.Currently,the only anomaly detection (AD) technologies available are neural networks (NNs),GAs,and wavelet.Previous works on IDS have assumed misuse detection and anomaly prediction.Conventionally,misuse and anomaly prediction approaches have both major advantages and disadvantages.Previous IDSs have been applied only for the identification of misuse or anomaly attacks,whereas concurrent misuse and anomaly IDSs have been developed to address limitations.

In this study,we introduce a new parameter-tuned deep-stacked autoencoder based on deep learning (DL),called PT-DSAE,for the IDS in CPSs.The proposed model comprises preprocessing to eliminate the noise present in the data.Next,a DL-based DSAE model is applied to detect anomalies in the CPS.In addition,hyperparameter tuning of the DSAE is performed by a search-and-rescue (SAR) optimization algorithm to tune the number of hidden layers,batch size,epoch count,and learning rate.To evaluate the experimental outcomes of the PT-DSAE model,a series of experiments were performed on data from a sensor-based CPS.

2 Literature Review

Different types of detectors have been introduced with machine learning (ML) and NNs.Goh et al.[4]established an unsupervised method for anomaly prediction in CPS-based recurrent neural networks (RNNs) as well as a cumulative sum approach.Kosek [5]implied a contextual AD technology for smart grids based on NNs.Krishnamurthy et al.[6]used a secondary method called Bayesian networks,which provides a means for learning causal correlations and temporal relations in cyber and external parameters from unlabeled data using Bayesian systems.Such modules are employed to predict abnormalities and isolate root causes.Jones et al.[7]developed a method based on formal ones to compute AD in CPSs.This model is equipped with model-free,unsupervised learning,which tends to create signal temporal logic (STL) from the final outcomes collected in common operations.Next,anomalies are predicted by a flagging method that does not satisfy the learned function.Kong et al.[8]described a scheme based on formal methods for supervised anomaly learning.

Chibani et al.[9]investigated the problems faced while creating fault detection filters in fuzzy systems,which assume errors and failures in discrete-time polynomial fuzzy systems.Moreover,AD is employed in security intrusions to predict the CPS over the intrusions.Urbina et al.[10]used a physics-based prediction of stealthy intrusions through industrial control systems.Conventional works are defined with prediction principles,which does not restrict the influence of stealthy attacks.Next,a new measure was utilized to measure the impacts,demonstrating attacks distinguished with better configuration.Unlike former schemes,Kleinmann et al.[11]considered predictive attacks over industrial control networks on the basis of cyber anomalies,and various modalities have been considered for forecasting errors projected in traffic networks.

Lu et al.[12]recommended a former work in AD of traffic sensors that,according to the level of data used,categorizes detection methods into three phases:macroscopic,mesoscopic,and microscopic.In general,several data correction approaches have provided practical guidelines for AD in traffic networks.Zygouras et al.[13]developed three methods based on Pearson’s correlation,cross-correlation,and multivariate ARIMA to examine failed traffic values.They also employed crowdsourcing to resolve indefinite values in faulty sensors.Finally,Robinson [14]applied a sample based on the correlation between flows at close sensors to detect faulty loop detectors.

3 The Proposed Parameter-Tuned Deep-Stacked Autoencoder Model

Fig.1 shows the process involved in the proposed PT-DSAE model.As depicted,the input data are first preprocessed to remove noise.Then,DSAE-based classification is performed,in which the parameters are optimized using an SAR optimization algorithm.

3.1 Stacked Autoencoder

It should be noted that the stacked autoencoder (SAE) applied in this study was developed using various autoencoder (AE) and logistic regression (LR) layers,as depicted in Fig.2.The AE is a fundamental unit of the SAE classification method.It is composed of an encoding step(Layers 1 to 2) and a decoding or reconstruction step (Layers 2 to 3).This process is depicted in Eqs.(1) and (2),whereWandWT(transpose of W) are weight matrices of modesbandb′are 2 various bias vectors of this mode;sis defined as a nonlinearity function,like the applied sigmoid function;ydenotes latent parameter implication of the input layerx;andzis viewed as a prediction ofxgiveny,which has a similar shape to that ofx:

Various AE layers are jointly stacked in the unsupervised pretraining phase (Layers 1 to 4).Then,the secondary representation ‘y’processed by the AE is applied as an input to the upcoming AE layer.The layer then undergoes training as an AE by reducing the reconstruction error,which has simultaneously been computed [15].Then,the reconstruction error (loss functionL(x,z)) is estimated in massive iterations.At this point,cross-entropy is applied to measure the reconstruction error,as depicted in formula (3),wherexkandzkrepresent thekth element ofxandz,respectively:

Importantly,the reconstruction failure is limited when a gradient descent (GD) model is applied.Hence,the weights in Eqs.(1) and (2) should be upgraded on the basis of Eqs.(4)-(6),where 0 implies a learning rate:

Figure 1:Steps followed in the proposed model

Once the layers are pretrained,the system is supervised at a fine-tuning stage.Then,from the supervised fine-tuning stage,an LR layer is included in an output layer at an unsupervised pretraining phase.In this work,the probability with input vectorx(Layer 4) comes under classias illustrated in formula (7),whereydefines a predicted class of input vectorx,·W;brepresents a weight matrix and bias vector;WjandWjrepresent theith andjth row of matrixW,respectively;bjandbjare theith andjth elements of vectorb,respectively;and softmax is a nonlinearity function applied in this work.The class with the maximum probability is considered the predicted label (ypred) of the input vectorx,as depicted in formula (8).The prediction error of a sample data setD(Loss(D))is estimated on the basis of true labels,as illustrated in formula (9),whereyjdenotes a true label ofxj.Loss(D)is reduced when a GD scheme is applied,which is same as reducing the reconstruction failure,as defined in the following:

Figure 2:Stacked autoencoder (SAE) structure

3.2 Parameter Optimization of a Deep-Stacked Autoencoder

In SAE networks,the pretraining layer is essential to gain the best weights with the help of an optimization model,and this is applied as initial variables for deep AE systems.Then,optimal attributes are applied to achieve the best detection accuracy.One of the effective models applied in this approach is backpropagation(BP),which depends on GD.However,this model has some deficiencies in large data sets,such as a low convergence speed and probability to fall into a local extremum.Here,theL-BFGS method is applied for initial parameter examination.This is one of the significant limited-memory quasi-newton mechanisms that can be applied in largescale data optimization issues.It can also be applied to search global optima with the maximum convergence speed.The procedure ofL-BFGS is defined inλlgvxiithm1.The main objective of this work is to identify optimal attributesθby reducing a functionf(x),wheref(x)is a nonlinear,frequently differentiable objective function.An objective function is illustrated in Eq.(4).Here,Hkrepresents an inverse Hessian approximation,which is upgraded at each iteration to obtainHk+1.In previous quasi-newton technologies,Hkwas denser and had an increased number of iterations,which becomes impossible as the memory and processing of a matrix.In general,theL-BFGS approach does not require the storage of a fulln×ninverse Hessian matrix;it saves the extended version ofHkby changing {sk,yk}.This model keepsr,which represents correction pairsfor upgrading theriterations.It can be seen that the cost of every iteration is minimal;thus,theL-BFGS approach exhibits a high implementation speed and strong robustness.

Algorithm 1:Application of the L-BFGS model to reduce a strictly convex function 1:Initiate θ0 as θij 0~[-6/N+M,6/N+M],(θ ∈RN×M)2:for k=1,2,... until convergence do 3:Estimate gk=∇f(θk)4: sk-1=θk-θk-1,yk-1=gk-gk-1 5: H0k=sTk-1yk-1 yTk-1yk-1 I 6:for i=k-r,...,k-1 do 7: si=θi+1-θi yi=gi+1-gi ρi=1 yTk sk Vi=I-ρiyisTi 8:end for 9: Hk=images/BZ_1054_400_2300_424_2346.pngVTk-1...VTk-rimages/BZ_1054_680_2300_704_2346.pngH0kimages/BZ_1054_778_2314_796_2359.pngVk-r...Vk-1images/BZ_1054_1051_2314_1069_2359.png+ρk-rimages/BZ_1054_1213_2300_1238_2346.pngimages/BZ_1054_1538_2300_1562_2346.pngimages/BZ_1054_1733_2314_1751_2359.pngVk-r+1...Vk-1images/BZ_1054_2052_2314_2070_2359.pngimages/BZ_1054_871_2403_896_2448.pngimages/BZ_1054_1196_2403_1220_2448.pngVTk-1...VTk-r+1 sk-rsTk-r+ρk-r+1 VTk-1...VTk-r+2 sk-r+1sTk-r+1images/BZ_1054_1480_2417_1498_2462.pngVk-r+2...Vk-1images/BZ_1054_1798_2417_1816_2462.png+...+ρk-1sk-1sTk-1 10:Allocate step size αk θk+1=θk+αkHkgk k ←k+1 11:end for

3.3 A Deep-Stacked Autoencoder Model Based on Search and Rescue

To enhance the training process of theL-BFGS model,a SAR optimization algorithm is employed.In SAR,the humans’places are similar to the solutions attained for optimization issues,and the volume of clues identified in these positions refers to an objective function for such solutions [16].Fig.3 shows the flowchart of SAROA.

Figure 3:Process of SAROA

Group variants collect clue data in the search,and only a few clues are left after identifying the optimal clues in alternate positions;however,this information is applied to enhance the searching task.In this approach,the places of the remaining clues are recorded in a memory matrix(matrixM),whereas intrusions are assumed in a position matrix (matrixX).The dimensions of matrixMare similar to those ofX.InN×Dmatrices,Drepresents the dimension of a problem andNrefers to intrusions.The clues matrix (matrixC) is described as a matrix with the places of clues.This matrix is composed of twoXandM.Notably,Eq.(10) implies the development ofC.Novel solutions in social and individual stages are deployed on the basis of the clues matrix,and it is the significant portions of SAR.The matricesMandCare upgraded in the human search phase:

whereMandXrefer to the memory and intrusion of a CPS,respectively,andXN1denotes the place of the first dimension for theNth value.Additionally,M1Drepresents the location of theDth dimension for the first memory.These modules have two phases,a social phase and an individual phase,as shown in the following.

From the given statement,a random clue was considered to find the searching direction using the given expression:

whereXi,Ck,andSDidenote the place of theith intrusion,the position of thekth clue,and a search direction of theith value,respectively,andkdenotes a random value within 1 and 2N(selected ink/=i).

Importantly,the search process should be computed when the group members are identified.However,the dimensions ofXiremain the same in Eq.(11).This condition is applied using a binomial crossover operator.Moreover,a defined clue is optimal when compared with clue based on recent position,the regions fromSDidirection as well as place of a clue is identified (Area 1);otherwise,a search task is processed in the present location withSDidirection (Area 2).Finally,the given function is applied in a social phase:

whereX′i,jdenotes the position of theith dimension for theith intrusion;Ckjrepresents the position of thejtdimension in thekth clue found;f(C)andf(X)are the objective functions for the solutionsCkandXi,respectively;r1 denotes a random value with a uniform distribution from [-1,1];r2 mimics uniformly distributed arbitrary within [0,1]that is varied from all the dimensions,and hence r1 is fixed for such dimensions;jrandrepresents a random value other than 1 andDassures a 1D ofX′i,jis differed fromXij;and SE represents a model variable from 0 and 1.Here,Eq.(12) is applied to achieve a new location ofith dimensions.

In the individual phase,intrusions are identified by the present clues applied in the social phase used for the searching process.Unlike in the social phase,the dimensions ofXiare modified in an individual phase.Hence,the intrusion of theith objective is obtained by the given derivations:

wherekandmrepresent random integer values ranging from 1 to 2N.To eliminate movement with other clues,kandmare selected ini/=k/=m.r3 defines a random value with a uniform distribution within 0 and 1.

In metaheuristic approaches,solutions should be placed in a solution space.When the solution exceeds the considered solution space,then it needs to be changed.Thus,when an IDS is processed from a solution space,the following equation is applied to change the new position:

whereandare the measures of the higher and lower thresholds for thejth dimension,respectively.

In all iterations,the group members find two stages in which the measure of the objective function at positionX(f(X))is higher than the existing one(f(Xi)).The traditional position (X)is saved randomly from a memory matrix (M) with the help of Eq.(15) and is approved as a novel place with the help of Eq.(16);otherwise,it is left and the memory remains the same:

whereMndenotes the place of thenth clue saved in the memory matrix andndefines a random integer value from 1 toN.This allows memory updates to enhance the diversity of a model and the capability of this model to identify a global optimum.

In the case of an SAR process,time is considered a significant factor,because when people get wounded,any delay by the SAR teams prevents them from finding these people.Hence,the process defined above is computed with a massive space and limited time duration.In general,the unsuccessful search number (USN) is fixed as 0 for all human beings.When an intrusion is examined,the USN is set as 0;otherwise,it is changed to 1,as shown below:

whereUSNishows the time of humaniwas not applicable to identify optimal clues.If the USN is higher than the maximum unsuccessful search value (MU),then a random position is selected in a search space by Eq.(18),andUSNiis fixed as 0:

wherer4 refers to a random value with a uniform distribution ranging from 0 to 1,which differs from one dimension to another.

Generally,SAR is composed of two control variables:social effect (SE) and MU.The SE is applied to manage the impact of group members in the social phase.This attribute falls in the range [0,1].Higher values of SE enhance the convergence value and limit the global search of a method.Here,theeMU parameter indicates a greater number of ineffective searches before excluding a clue.It falls within the range [0,2×Tmax],where 2×Tmaxmeans higher searches andTmaxrepresents a larger number of iterations.In case of massive values inMU,attacks or intrusions can be identified.A minimum value of this attribute results in Group 3 members finishing their exploration of the present clue and moving on to an alternate position.Therefore,MU is compared with the dimension of the problem.When the search space is maximized,the massive count of unsuccessful searches is also enhanced.Hence,the measure of SE is allocated as 0.05,and the measure of MU is accomplished by Eq.(19).Analysis of the SAR variables shows that the predefined values for SE and MU can be applied to identify CPS intrusions:

4 Performance Validations

For an experimental analysis,a series of experiments were performed on the NSL-KDD dataset,which includes samples under five attack types.This dataset contains a total of 45,927 samples under denial-of-service (DoS) attack,995 samples under R21 attack,11,656 samples under probe attack,52 samples under U2r attack,and 67,343 samples under normal attack,as shown in Tab.1.Fig.4 presents details related to this dataset.

Table 1:Dataset description

Figure 4:Types of attacks in the NSL-KDD dataset

Both Tab.2 and Fig.5 show the performance of the PT-DSAE model in the identification of intrusions in a CPS.The figure shows that the PT-DSAE model has detected DoS attacks with a precision of 0.9702,recall of 0.9837,F-measure of 0.9778,and accuracy of 0.9787;R21 attacks with a precision of 0.9806,recall of 0.9861,F-measure of 0.9872,and accuracy of 0.9867;probe attacks with a precision of 0.9895,recall of 0.9922,F-measure of 0.9933,and accuracy of 0.9918;U2r attacks with a precision of 0.9793,recall of 0.9863,F-measure of 0.9851,and accuracy of 0.9842;normal attacks with a precision of 0.9758,recall of 0.9840,F-measure of 0.9851,and accuracy of 0.9842;and intrusions with a average precision of 0.9791,recall of 0.9865,F-measure of 0.9860,and accuracy of 0.9849.

Table 2:Result analysis of the proposed PT-DSAE method

Figure 5:Result analysis of the PT-DSAE method with different measures (a) Precision;(b) Recall;(c) F-measures and (d) Accuracy

Tab.3 and Figs.6 and 7 show a comparative result analysis of the PT-DSAE model with existing models with respect to distinct measures [17-21].Regarding classifier results with respect to precision,the figure shows that the IDBN model yielded a poor classifier outcome with the least precision of 0.904.At the same time,the AK-NN model surpassed the IDBN model with a precision of 0.9219.Likewise,the DL model attained a precision of 0.9354,while an even better precision of 0.9512 was attained by the DPC-DBN model.Moreover,the DT model attained a moderate precision of 0.9659,while the AdaBoost,T-SID,random forest (RF),and SVM models attained close precision values of 0.9742,0.9751,0.9756,and 0.9774,respectively.However,it was observed that the proposed PT-DSAE model attained a maximum precision of 0.9791.

Table 3:Result analysis of existing models with the proposed PT-DSAE method

Figure 6:Comparative analysis of the PT-DSAE model in terms of precision and recall

With regard to the computation of the classifier outcomes by means of recall,the Fig.6 shows that the IDBN method attained an inferior classifier result with a minimum recall of 0.92.Simultaneously,the DT framework outperformed the IDBN model with a recall value of 0.9284.Similarly,the AdaBoost approach generated a recall value of 0.9321,while a moderate recall value of 0.9376 was generated by the AK-NN scheme.In line with this,the RF technology attained a considerable recall value of 0.9384,and the SVM,DL,DPC-DBN,and T-SID methods yielded close recall values of 0.9436,0.9487,0.9499,and 0.9517,respectively.Similarly,with regard to the evaluation of the classifier results in terms of F-measure,the Fig.7 shows that the IDBN method has attained an insignificant classifier outcome with a low F-measure of 0.908.Moreover,the AK-NN technology surpassed the IDBN method with an F-measure of 0.9292.In line with this,the DL approach generated an F-measure of 0.9412,while an acceptable F-measure of 0.9508 was generated by the DPC-DBN framework.Likewise,the DT scheme attained a reasonable F-measure of 0.9542,followed by the AdaBoost,RF,SVM,and T-SID methods,which attained close F-measure values of 0.9568,0.9592,0.9655,and 0.9729,respectively.Importantly,it was observed that the proposed PT-DSAE technique attained an optimal F-measure of 0.986.

Figure 7:Comparative analysis of the PT-DSAE model in terms of F-measure and accuracy

With regard to the measurement of the classifier results in terms of accuracy,the Fig.7 shows that the AK-NN approach yielded an ineffective classifier outcome with a minimal accuracy of 0.9199.Concurrently,the DL scheme performed quite better than the AK-NN model with an accuracy of 0.9277.Moreover,the DT method generated an accuracy of 0.9365,while a reasonable accuracy of 0.9396 was attained by the T-SID model.Similarly,the DPC-DBN scheme yielded a considerable accuracy of 0.9498,whereas the AdaBoost,RF,IDBN,and SVM approaches exhibited close accuracy values of 0.9587,0.9598,0.9617,and 0.9632,respectively.Importantly,it was observed that the newly proposed PT-DSAE scheme yielded a superior accuracy of 0.9849.

5 Conclusion

In this study,we developed an effective IDS using DL models for CPSs.First,input data were preprocessed to remove noise,and then a DSAE-based classification process was performed,in which the parameters were optimized using a SAR optimization algorithm.In SAE networks,the pretraining layer is essential to obtain the best weights with the help of an optimization model,and this is applied as initial variables for deep AE systems.To improve the training process of theL-BFGS model,a SAR optimization algorithm was employed.For an experimental analysis,a series of experiments were performed on the NSL-KDD dataset,which includes samples under five attack types.From the experimental results,it was observed that the PT-DSAE model identified intrusions with an average precision of 0.9791,recall of 0.9865,F-measure of 0.9860,and accuracy of 0.9849.Therefore,it can be applied as an effective tool for intrusion detection in CPSs.In the future,hybrid optimization algorithms can be used to improve the performance.

Funding Statement:The author(s) received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.