An Efficient Intrusion Detection Framework in Software-Defined Networking for Cybersecurity Applications

2022-08-24 07:01GhalibAlshammriAmaniSamhaEzzElDinHemdanMohammedAmoon1andWalidElShafai
Computers Materials&Continua 2022年8期

Ghalib H.Alshammri,Amani K.Samha,Ezz El-Din Hemdan,Mohammed Amoon1, and Walid El-Shafai

1Department of Computer Science,Community College,King Saud University,Riyadh,28095,Saudi Arabia

2Deanship of Scientific Research,Saudi Electronic University,Riyadh,Saudi Arabia

3Management Information System Department,College of Business Administration,King Saud University,Riyadh,28095,Saudi Arabia

4Department of Computer Science and Engineering,Faculty of Electronic Engineering,Menoufia University,Menouf,32952,Egypt

5Security Engineering Lab,Computer Science Department,Prince Sultan University,Riyadh,11586,Saudi Arabia

6Electronics and Electrical Communications Eng.Depart.,Faculty of Electronic Engineering,Menoufia University,Menouf,32952,Egypt

Abstract: Network management and multimedia data mining techniques have a great interest in analyzing and improving the network traffic process.In recent times,the most complex task in Software Defined Network (SDN) is security,which is based on a centralized,programmable controller.Therefore,monitoring network traffic is significant for identifying and revealing intrusion abnormalities in the SDN environment.Consequently,this paper provides an extensive analysis and investigation of the NSL-KDD dataset using five different clustering algorithms:K-means,Farthest First,Canopy,Density-based algorithm,and Exception-maximization(EM),using the Waikato Environment for Knowledge Analysis (WEKA) software to compare extensively between these five algorithms.Furthermore,this paper presents an SDN-based intrusion detection system using a deep learning(DL)model with the KDD(Knowledge Discovery in Databases)dataset.First,the utilized dataset is clustered into normal and four major attack categories via the clustering process.Then,a deep learning method is projected for building an efficient SDN-based intrusion detection system.The results provide a comprehensive analysis and a flawless reasonable study of different kinds of attacks incorporated in the KDD dataset.Similarly,the outcomes reveal that the proposed deep learning method provides efficient intrusion detection performance compared to existing techniques.For example,the proposed method achieves a detection accuracy of 94.21%for the examined dataset.

Keywords: Deep neural network;DL;WEKA;network traffic;intrusion and anomaly detection;SDN;clustering and classification;KDD dataset

1 Introduction

Recently,the SDN has been developed to be one of the talented answers for changing the future of global networks and the Internet[1-3].The new emerging technology SDN detaches the data and control planes separately.The control plane can manage the complete network security concerns[4-7].

Deep learning(DL)is based on the extraction process of big data with complicated structures,so it is considered a type of machine learning model[8,9].It is valuable in the case of learning from the huge size of unsupervised data[10,11].The employment of deep learning achieves several advantages like higher classification performance and enhanced quality of produced samples.These benefits can realize superior machine learning results in applications like automatic natural language processing,speech recognition,computer vision,image recognition,and bioinformatics [12,13].Furthermore,there are numerous valuable attributes of the learned abstract descriptions using deep learning as pretty straightforward linear models can perform efficient performance[14].

The network traffic,suspicious activities,and network administration can be controlled through the utilization of the IDS (Intrusion Detection System) [15,16].The IDS framework involves an analysis engine,sensors,and a reporting system.SDN-based security is an important model to control the malicious flows in SDN switches[17,18].Therefore,with the increasing amount of severe attacks and threats against a variety of computing systems such as a computer,network,cloud,and internet of things,the study of IDS has received a lot of attention from several researchers in the last decades in the security area [19].The spiteful attacks such as theft of information can be detected using the IDS model [20,21].The NIST (National Institute of Standards and Technology) [22]describes the incursion or intrusion as“an effort to negotiate the CIA(Availability,Integrity,and Confidentiality),or to evade the protection procedures of a network or computer”.

The clustering-based IDS schemes are efficient compared to the traditional IDS schemes for detecting unknown attacks [23].Furthermore,the KDD (Knowledge Discovery in Databases) is employed to find out and extract valuable information from huge relational databases.Therefore,data mining is important for determining significant non-intuitive patterns and correlations for obtaining superior knowledge from small data[24,25].

Intrusion detection using the unlabeled data technique,which is known as clustering,can group related records in the same clusters.Then,the common distance metrics on these clusters can be used to determine the anomalies[26].The clustering process is an unsupervised learning process that deals with unlabeled data [27].The machine learning (ML)-based intrusion detection analysis can help in detecting abnormal behavior in a network [28].Nevertheless,several algorithms are available for intrusion detection,but their performance is needed to improve.One subset of ML methods is DL.Deep learning gained a reputation due to its perspective for machine learning.

For this reason,DL methods have been employed to identify some pattern types or cataloging.Furthermore,the application of deep learning can help to improve the accuracy and performance of IDS frameworks[29].To perform and understand data analysis for this work,there are several data mining process models as the following[30]:

• KDD(Knowledge Discovery in Databases):The KDD is a process of how specialists can extract patterns and insights from data.It consists of five stages:Interpretation/Evaluation,Data Mining,Transformation selection,and Preprocessing.This model is followed in this research to achieve the paper’s task of constructing an intrusion detection system.

• SEMMA (Sample,Explore,Modify,Model,and Access):The SEMMA model has a similar structure to KDD,nevertheless as it does not focus as deeply on data-specific phases,it is easier to apply to general data analysis tasks.

• CRISP-DM (Cross-Industry Standard Process for Data Mining):This model was formerly established in IBM for data mining tasks,finds it useful for almost all several data analytics projects.

This research paper investigates,discusses,and analyses five clustering algorithms:K-means,Farthest First,Canopy,Density-based algorithm,and Exception-maximization (EM),using the WEKA software to compare extensively between these five algorithms.The primary objective of this work is to analyze the NSL-KDD dataset.Furthermore,this paper presents an SDN-based intrusion detection system using a DL model with the NSL-KDD dataset.

In conclusion,the contribution of this paper is as follows:

• Provide an extensive comprehensive analysis of intrusion network traffic data via the NSLKDD dataset.The utilized dataset is clustered into normal and four major attack categories.

• Develop a deep learning model for building a proficient SDN-based IDS.

• Conducting a systematic evaluation of the intrusion detection based on ML performances over the NSL-KDD dataset with the proposed model.

• The results provide a comprehensive analysis and a clear,reasonable study of various kinds of attacks incorporated in the NSL-KDD dataset.Similarly,the findings reveal that the suggested deep learning technique provides efficient IDS performance compared to existing procedures.

The remainder of this paper is constructed as follows.Section 2 presents a brief discussion of IDS,intrusion detection methodologies,deep learning,and SDN-based IDS,while the recent related works are investigated and provided in Section 3.Section 4 illustrates the suggested intrusion databased clustering and detection scenarios,while the simulation outcomes and comparative analysis are presented in Section 5.Ultimately,the conclusion is stated in Section 6.

2 Preliminary Knowledge

This section presents a brief discussion of intrusion detection systems,intrusion detection methodologies,deep learning,and SDN-based IDS.

2.1 Intrusion Detection Systems

An IDS is considered a software or hardware system that checks a network’s activities or system for strategy malicious actions and creates reports to the management and administrative system.The key task of intrusion prevention and detection systems(IPS/IDS)is to recognize the attacks and intrusions.Additionally,they are used to prevent persons and exhibit document warnings from violating security procedures and recognize problems with security guidelines.

The IDS/IPS has become a crucial matter for security purposes.Numerous techniques can be employed for the intrusion detection process [31].They are used to protect computer networks in the presence of several attacks.On the other hand,it may also discard the packets or dismiss the connection.The intrusion systems can be classified as follows[32,33].

• Host-based IDS(HIDS):The HIDS is exploited to protect a particular host.It includes agents or software modules.The software of the HIDS has operated runs on the network machines like routers,switches,and servers,etc.Thus,the agent of HIDS is operated like a consistent host.More information and details about HIDS can be found in[31,32].

• Network-based IDS (NIDS):This kind attempts to protect all machine systems in the IoT network.The architecture of NIDS(Network-based IDS).It comprises a group of sensors with particular functions located at a different location within the network.NIDS implementation can have a fantastic effect on the performance of the computer network.More information and details about HIDS can be found in[32,33].

2.2 Intrusion Detection Methodologies

Detection models are divided into two diverse types:signature or statistical-based models.The first type of signature model contrasts the traffic alongside a cluster of existing signatures.On the other side,the second type of statistical model preserves the users,hosts,applications,and connections profiles.Likewise,two key detection schemes are employed by the host or network:signature and anomaly-based models used to analyze the actions and discover intrusions.Consequently,there are three key intrusion detection methods,which are SD (Signature-based Detection),AD (Anomalybased Detection),and SPA(Stateful Protocol Analysis).Tab.1 demonstrates the prime disadvantages and advantages of the three detection methods.More information and details about the three key intrusion detection methods can be found in[31,34].

Table 1:Types of intrusion detection methodologies with their advantages and disadvantages

2.3 Deep Learning

DL has been a pretty research subject in recent times for different domains [35].DL depends on the unsupervised or supervised processes to discover classified depictions in deep structures[36].The most aim of DL is to exploit the Artificial Neural Networks (ANN) to extract more levels of higher features.The categorization DL depends on the purpose and objective of structural design and procedures.The division of DL algorithms can be categorized.More information and details about the DL models and structures can be found in[37,38].

2.4 Software Defined Network

The SDN(Software Defined Network)is an adaptable,manageable,dynamic,and valuable infrastructure [11].It combines numerous network tools for dynamic and centralized computer network infrastructure management.It lets the administrator of a network manage the enterprise demands rapidly.The general architecture of the SDN consists of control,infrastructure,and application layers.It is noticed that the control and data planes are separated from the network devices’functionalities.More information and details about the SDN control and data planes can be found in[39,40].

The architecture of SDN extremely confirms network analysis and observing tools through the programming feature of the SDN controller.In [39],the authors suggested SDN-based IDS is presented in Fig.1.This proposal aims to investigate the SDN network traffic for the malicious recognition process.The IDS is considered on the SDN controller identifies two procedures employed on the coming traffic from the switch to identify the intrusion attempts which decrease the performance of the SDN network infrastructure.Two main IDS techniques can be employed in SDN controllers:packet counter and time interval techniques.More information and details about the three key intrusion detection methods can be found in[39,40].

Figure 1:SDN-based intrusion detection system

3 Methodology and Implementation

Over recent years,some research works have been available on intrusion detection using data mining procedures like traffic data clustering and intrusion detection and classification.Portnoy[40]presented a new kind of clustering-based IDS scheme known as anomaly intrusion detection,which trains on unlabeled data to identify new intrusions.The authors in[41]suggested a new NSL-KDD data set,which composes a complete KDD data set.

Panda et al.[42]proposed a hybrid intelligent approach using a combination of classifiers to make the decision intelligently so that the overall performance of the resultant model is enhanced.Kang et al.[43]proposed an optimal feature selection algorithm that tackles the problem of choosing the optimal subset of features from several commonly used features to detect network intrusion that requires extensive computing resources.Xiu-yu[44]proposed a model of online attack detection for computer forensics to collect crime evidence of the attack.Siddiqui et al.[45]introduced an analysis of 10%of the KDD cup’99 training dataset based on intrusion detection.Also,they focused on establishing a relationship between the attack types and the protocol used by the hackers,using clustered data.

Subramanian et al.[46]presented an analysis of the effect of clustering the training data and test data in the classification efficiency of the Naive Bayes classifier.Kumar et al.[47]proposed a clustering approach based on a simple k-means clustering algorithm to analyze the NSL-KDD dataset.This proposed work provided a complete analysis of the NSL-KDD intrusion detection dataset.They tried to cluster the dataset into normal and four major attack categories such as DoS,Probe,R2L,U2R.

This paper proposes a deep learning (DL) model for building an efficient software-defined network(SDN)-based intrusion detection system.The NSL-KDD dataset is extensively analyzed and investigated using different clustering algorithms,including the proposed model.The utilized dataset is clustered into normal and four major attack categories.The results provide a complete analysis and a clear comparative study of different kinds of attacks included in the NSL-KDD dataset.Likewise,the results reveal that the proposed deep learning technique provides efficient intrusion detection performance compared to existing techniques.

4 Proposed Intrusion Data-Based Clustering and Detection Scenarios

This section offers a detailed explanation of the proposed work for traffic data clustering and intrusion detection in software-defined networks.The entire hybrid suggested system for SDN-based intrusion data clustering and classification is shown in Fig.2.

4.1 Traffic Data-Based Clustering Algorithms

The clustering process is the data division into similar or dissimilar groups.The main benefit of the clustering process is the anomaly detection of intrusions deprived of preceding knowledge.This paper discusses the comparative study and investigation of five clustering algorithms:K-means,Farthest First,Canopy,Exception-maximization(EM),and Density-based algorithm.The description of the employed comparison clustering algorithms is introduced as follows.

1) K-means clustering algorithm.The K-means clustering algorithm is a cluster analysis method where we define K disjoint clusters based on the feature value of the objects to be grouped.

2) Farthest first clustering algorithm.It has the same procedure as the K-means algorithm;this also chooses centroids and assign the objects in the cluster but with max distance and initial seeds are the value which is at the largest distance to the mean of values,here cluster assignment is a different clustering algorithm,at initial cluster,we get a link with high Session Count,like at cluster-0 more than in cluster-1,and so on.

3) Canopy clustering algorithm.It is a pre-clustering unsupervised scheme.It may be utilized as a pre-processing stage for the Hierarchical clustering scheme or the K-means scheme.It is a fast,simple,and accurate scheme in the clustering process,where each clustered object is arranged to be the only point within a multi-dimensional feature space.The canopy scheme utilizes two thresholds,T1>T2 and the fast approximate distance metric for processing purposes.

4) Exception-maximization (EM) clustering algorithm.It is considered an expansion of the K-means scheme.It assigns the object to a cluster depending on a weight indicating the membership probability.So,there are no precise restrictions among different clusters.The EM scheme introduces a higher accuracy compared to the K-means scheme.

5) Density-based clustering algorithm.It is a data clustering algorithm.In the case of given a set of points in some space,it groups together points that are closely packed together(points with many nearby neighbors),marking as outliers point that lie alone in low-density regions(whose nearest neighbors are too far away).

Figure 2:Proposed hybrid system combined clustering and classification for intrusion detection in SDN

4.2 SDN-Based IDS Using Deep Learning Model

This part introduces the proposed SDN-based IDS using deep learning.The proposed system can help to identify malicious attacks as intrusion actions.The proposed SDN-based deep learning model for the IDS process is given in Fig.3.The dataset of NSL-KDD is utilized to evaluate the suggested SDN-based IDS using a DL scheme.All experiments have been performed using Python programming language on Spyder on Anaconda navigator software utilizing the Intel Core i5 GHz processor with 12 GB of RAM and 500 GB HD.

Figure 3:SDN-based deep learning model for intrusion detection

In this work,the NSL-KDD dataset is used to evaluate the results of applied algorithms.Dataset was introduced to resolve some intrinsic difficulties of the KDD-cup 1999 dataset[48,49].The original KDD’99 dataset was composed of the test and train datasets that were utilized previously to evaluate the performance of IDS models.It combines three different categories of features:traffic-based,content-based,and basic features.The NSL-KDD dataset is a modern description of the KDD dataset.Therefore,it is employed in this paper to assess the feature selection subsets efficiency of the suggested schemes.Attacks in the dataset are categorized into four categories of U2R (User to Root) attack,R2L(Remote to Local)attack,probing attack,and DoS(Denial of Service)attack according to their characteristics as shown in Tab.2.

Table 2:Attack categories in the NSL-KDD dataset

Therefore,the NSL-KDD dataset is utilized in this work to realize the evaluation and training of the suggested work.The NSL-KDD dataset has 41 attributes unfolding different traffic flow features,and a label is assigned either as a particular attack type or as normal data [50].The features in the NSL-KDD dataset are of different data types.The testing dataset contains 38 attack types,while the training dataset includes 24 attack types.Tab.3 presents the features with varying types of data in the NSL-KDD dataset.

Table 3:Features with various data kinds in the NSL-KDD dataset[51]

5 Results and Discussions

This section illustrates the comparative results analysis,discussions and simulation environment setup of the proposed intrusion data-based clustering and detection scenarios.

5.1 Simulation Results of the Traffic Data-Based Clustering Algorithms Environment

The simulation results are performed with the WEKA environment using the NSL-KDD on a computer with a Core-i5 processor with 4 GB of RAM[52].We performed a normalization process in the range of 0-1 to all input dataset attributes before the employment of the clustering process,and the number of clusters is set up to four.

The NSL-KDD dataset is analyzed using the K-means,Farthest First,Canopy,Exceptionmaximization (EM),and Density-based clustering algorithms,where the major attack types are presented in the training dataset.The performance of the aforementioned algorithms is evaluated based on the number of instances per cluster,the execution time,and incorrectly clustered instances.Tabs.4 to 8 and Figs.4 to 8 depict the outcomes of the tested simulation results.Four clusters are distributed for the clustered instances,clusters 0 to 3,including normal cases.The clustered instances distribution is classified into Normal,DoS,R2L,U2R,and Probe.

Table 4:The simulation results of the K-means clustering algorithm

Table 5:The simulation results of the farthest first clustering algorithm

Table 6:The simulation results of the canopy clustering algorithm

Table 7:The simulation results of the EM clustering algorithm

Table 8:The simulation results of the density-based clustering algorithm

Figure 4:Distribution of instances to clusters using the K-means clustering algorithm

Fig.4 gives the clustered instances results using the K-means algorithm.The number of clustered instances in the case of each tested cluster is presented in Tab.4.Moreover,this table presents the distribution of the instances of each attack.It is noticed that the K-means scheme takes 4.51 s to build cluster models and the incorrectly clustered instances are 65108.Fig.5 shows the clustered instances results using the Farthest First algorithm.The number of clustered instances in the case of each tested cluster is introduced in Tab.5.Furthermore,this table presents the distribution of the instances of each attack.It is observed that the Farthest First scheme takes 0.39 s to build cluster models and the incorrectly clustered instances are 47143.

Figure 5:Distribution of instances to clusters using the farthest first clustering algorithm

Fig.6 presents the clustered instances results using the Canopy algorithm.The number of clustered instances in the case of each tested cluster is given in Tab.6.Moreover,this table presents the distribution of the instances of each attack.It is noticed that the Canopy scheme takes 2.53 s to build cluster models and the incorrectly clustered instances are 46628.Fig.7 shows the clustered instances results using the Exception-maximization(EM)algorithm.The number of clustered instances in the case of each tested cluster is presented in Tab.7.Furthermore,this table presents the distribution of the instances of each attack.It is observed that the EM scheme takes 40.48 s to build cluster models and the incorrectly clustered instances are 36667.

Figure 6:Distribution of instances to clusters using the canopy clustering algorithm

Fig.8 shows the clustered instances results using the Density-based clustering algorithm.The number of clustered instances in the case of each tested cluster is introduced in Tab.8.Moreover,this table presents the distribution of the instances of each attack.It is noticed that the Density-based clustering scheme takes 5.41 s to build cluster models and the incorrectly clustered instances are 48184.Tab.9 and Fig.9 show a comparison between the five clustering algorithms based on the number of instances between the four clusters.Tab.10 and Fig.10 show a comparison between the five algorithms based on the execution time.Tab.11 and Fig.11 show a comparison between five clustering algorithms based on the number of incorrectly clustered instances.

Figure 7:Distribution of instances to clusters using the EM clustering algorithm

Figure 8:Distribution of instances to clusters using the density-based clustering algorithm

From all the presented simulation and comparison results,it is obvious that the distribution of instances is various from one cluster to another.It is also noticed that the Farthest First clustering algorithm takes less execution time among the five clustering algorithms.Moreover,the EM clustering algorithm gives a great number of incorrectly clustered instances among the five algorithms.

Table 9:The comparison results between five clustering algorithms

Figure 9:Comparison between clustering algorithms based many of instances

Table 10:Comparison between the clustering algorithms based on execution time

Figure 10:Comparison of results between the clustering algorithms based on execution time

Table 11:Comparison of between the clustering algorithms based on the incorrectly clustered instances

Figure 11:Comparison between the five clustering algorithms based on the number of incorrectly clustered instances

5.2 Simulation Results of the SDN-Based IDS Using Deep Learning Model

The deep learning-based IDS model is implemented for binary class attack classification.It classifies the input data whether it belongs to the normal or attack.The classification accuracy is evaluated for all 41 features,where the accuracy of detection of different attack classes was higher using the proposed deep learning model compared to other machine learning techniques such as LR,LDA,and NB as tabulated in Tab.12 and Fig.12.

Table 12:Accuracy based comparison of different algorithm

In the suggested model,the training percentage is 20%,epochs are 200,the batch size is 10,and the number of selected attributes is 16.The accuracy and loss percentage with different types of epochs are shown in Figs.13 and 14,wherewith increasing the number of epochs,the accuracy percentage is increased,and the loss percentage is decreased.

Figure 13:Proposed deep learning model accuracy

5.3 Comparative Analysis

From Tab.13,the importance of the proposed scheme is quite evident.The proposed approach is an attempt to build an efficient software-defined network (SDN)-based intrusion detection system using deep learning.Besides,extensively analyzing and investigating the NSL-KDD dataset using different clustering algorithms into normal and four major attack categories.The results conceal that the proposed deep learning technique provides efficient intrusion detection performance compared to existing standard techniques.The results deliver a broad analysis and a rich comparative study of different kinds of attacks in this dataset.

Figure 14:Proposed deep learning model loss

Table 13:Comparative analysis with the previously proposed systems

6 Conclusion and Future Work

In recent times,recognizing and detecting anonymous threats and risks is a significant task in providing a secure SND system by applying an efficient intrusion detection system.Therefore,this paper provides a clustering-based analysis of the NSL-KDD dataset using the K-means,Farthest First,Canopy,Exception-maximization (EM),and Density-based clustering algorithms.Likewise,it presents a deep learning system for building an efficient intrusion detection system to detect unknown malicious and illegitimate activities.The simulation results show that using the Farthest First scheme introduced superior instances distribution compared to the other four clustering schemes.The deep learning model also provides high-performance accuracy compared with existing algorithms in detecting intrusion events.For example,the proposed DL method achieved a high detection accuracy of 94.21%.In our future work,we intend to employ advanced artificial intelligence tools to detect the new versions of attacks in the SDN network.In addition,the real implementation of an IoT-based SDN network in cybersecurity applications will be developed and designed.

Funding Statement:The authors received no specific funding for this study.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.