Few-Shot Learning for Discovering Anomalous Behaviors in Edge Networks

2021-12-15 08:12MernaGamalHalaAbbasNourMoustafaElenaSitnikovaandRowaydaSadek
Computers Materials&Continua 2021年11期

Merna Gamal,Hala M.Abbas,Nour Moustafa,Elena Sitnikova and Rowayda A.Sadek

1Department of Information Technology,Helwan University,Cairo,Egypt

2Department of Computer Science,Helwan University,Cairo,Egypt

3School of Engineering and Information Technology,University of New South Wales at ADFA,Canberra,Australia

Abstract:Intrusion Detection Systems(IDSs)have a great interest these days to discover complex attack events and protect the critical infrastructures of the Internet of Things (IoT) networks.Existing IDSs based on shallow and deep network architectures demand high computational resources and high volumes of data to establish an adaptive detection engine that discovers new families of attacks from the edge of IoT networks.However,attackers exploit network gateways at the edge using new attacking scenarios (i.e.,zero-day attacks),such as ransomware and Distributed Denial of Service (DDoS)attacks.This paper proposes new IDS based on Few-Shot Deep Learning,named CNN-IDS,which can automatically identify zero-day attacks from the edge of a network and protect its IoT systems.The proposed system comprises two-methodological stages:1)a filtered Information Gain method is to select the most useful features from network data,and 2)one-dimensional Convolutional Neural Network(CNN)algorithm is to recognize new attack types from a network’s edge.The proposed model is trained and validated using two datasets of the UNSW-NB15 and Bot-IoT.The experimental results showed that it enhances about a 3%detection rate and around a 3%-4%falsepositive rate with the UNSW-NB15 dataset and about an 8% detection rate using the BoT-IoT dataset.

Keywords:Convolution neural network;information gain;few-shot learning;IoT;edge computing

1 Introduction

The Internet of Things (IoT) plays a significant role in constructing smart systems,including smart homes,smart cities,and healthcare,to offer automated services to users and organizations [1].The IoT can be defined as a communication model in which any device acts as an object that exchanges data through the Internet and senses the environment [2].It consists of many IoT peripherals such as sensors and actuators that connect with the Internet.With the prevalence of IoT systems,network architectures have been redesigned to include three tiers of edge/physical,fog,and cloud [3].The edge tier includes all computer devices,IoT devices,and network appliances [4].This layer is linked with the fog tier,which is the interface that includes virtualization platforms and gateways.Both layers are interconnected with the cloud tier that offers software,platforms,and infrastructure services to end-users [3,5].

The technology of IoT provides different functions that can interconnect devices and applications,along with computing resources,to handle the data captured [6].The security of IoT systems is still the main challenge in the cybersecurity domain,due to the heterogeneity of IoT devices and the large number of IoT services linked to the network [3].Manufacturers often do not plug security services to their IoT products,exceptionally light devices-enabled IP addresses,due to their non-standard and licensed firmware [4].This leads to various vulnerabilities,either in the firmware or network level,in which attackers attempt to breach IoT systems and their networks.There are three security challenges in IoT networks [7].Firstly,physical impendence in the edge layers results from weaknesses of hardware protection.Secondly,this is followed by a confidentiality challenge that discloses sensitive information of IoT services passed to the fog and cloud layers.Man-In-The-Middle (MITM) and reconnaissance attacks are common hacking techniques that violate the confidentiality of IoT networks.The cyber threat of confidentiality is often risky between gateways and IoT devices at the edge.Thirdly,the integrity challenge that alters or manipulates original data of IoT systems that breach privacy.This often happens using spoofing,poisoning,evasion,and inference attacks that steal and/or illegally modify the telemetry data of IoT systems and their networks [8,9].

The security and privacy of IoT networks are essential,which need to safeguard the IoT components that depend on object identification technologies.Every IoT object has its own identity that loads all its information,such as location and personal information.To monitor IoT systems’services at the edge of a network,defensive mechanisms,such as Intrusion Detection Systems (IDSs),should be effectively deployed and configured.The discovery of cyber-attacks at the edge layer would address the security issue in IoT networks.IDSs have been widely proposed to monitor and recognize cyber-attacks in network systems.However,existing network IDSs still suffer from the challenge of detecting new attack families (i.e.,zero-day attack),especially with the extensive amount of network traffic collected from heterogeneous IoT systems across network connectivity [7,10].Some IDSs have been explored in the literature to utilize shallow and deep learning techniques to discover cyber threats.A shallow network is declared as an artificial neural network that consists of one/two hidden layers.Deep learning (DL) is considered an improvement of shallow learning,but the difference is that deep learning has many hidden layers with different architectures [11].Researchers have broadly used DL techniques in many fields,such as image processing,biomedical,and security.The shallow networks have achieved reasonable outputs in detection accuracy and low false alarm rates,for handling small-scale data.However,large-scale data demand a deep adaptive architecture that can learn hidden patterns and extract characteristic data features of anomalous behaviors in real-time,as we suggest in this study.

We propose using Few-Shot Learning (FSL) architecture [12]to address this challenge,which can deal only with a limited number of instances in time-series analysis of network traffic.In more detail,this paper proposes a new IDS using a few shot deep learning models that can discover cyber-attacks from the edge of a network.The proposed system includes two methodological phases:feature selection and decision engine.In feature selection,a filtered information gain method is employed to select the most useful features from network data.This phase improves the processing times and enhances the performance of the decision engine.Few-Shot Deep Learning techniques are utilized in the decision engine using an adaptable Convolutional Neural Network (CNN) architecture [13].CNN is used to recognize new attack types from the network’s edge.The proposed work is trained using two-benchmark datasets of the UNSW-NB15 [14]and Bot-IoT [15].

The rest of the paper is structured as follows.Section 2 presents the background and related work of IoT and IDS.Section 3 explains the proposed approach for the intrusion detection system.Section 4 describes the empirical results and discussions.Finally,Section 5 introduces the conclusion of the paper.

2 Background and Related Work

This section explains the background and previous studies related to IDS and IoT networks.

2.1 Intrusion Detection System

An Intrusion Detection System (IDS) is a security solution,either hardware or software,which monitors network traffic and/or audit traces of client systems to identify cyber threats from computer and network systems [1].Some IDSs react to intrusions in a real-time manner,while others do not work in real-time due to performing depth analysis for forensic purposes [16].An IDS is essential software that monitors the traffic of the network that recognizes malicious events [17].It mainly includes three stages:1) a data preprocessing method is to filter and clean data;2) an intrusion detection method is to train and test legitimate and suspicious observations;and 3) a decision-making method is to alert malicious events [5].

There are two popular forms of IDSs-based deployment:Host-based IDS (HIDS) and Network-based IDS (NIDS) [18].One the one hand,a HIDS monitors system activities of hosts,for example,system configuration,application activity,system logs,application processes,and file access [19].On the other hand,a NIDS monitors network activity and analyzes the collected information to identify suspicious events from network traffic [20].The NIDS consumes low computational processing less than the HIDS and has a quicker response because it does not require maintaining for the sensor programming at the host level [1].There are three detection methods in IDSs:1) anomaly-based detection,2) misuse-based detection,and 3) a hybrid of both.An anomaly detection method designs a standard profile and discovers outliers as anomalies [21].A misuse-based detection method depends on well-known signatures and matches them against a blacklist of suspicious events.A misuse-based IDS cannot discover new attack types while An anomaly-based IDS can detect them,along with a false alarm rate if small variations of normal and abnormal patterns have been identified [22,23].

IDSs have been designed based on machine and deep learning algorithms to recognize cyber threats [1,5].Deep learning algorithms have proven their capability in different applications,such as computer vision and malware detection [5].Deep learning can be categorized on its architecture into generative and discriminative.The classes of generative architecture are Recurrent Neural Network (RNN),Deep Auto Encoder,Deep Boltzmann Machine (DBM),and Deep Belief Networks (DBN) [1].Auto-encoder consists of two symmetrical components,which are an encoder and a decoder.The encoder works to extract the features from the raw data.The decoder reshapes the data from the features that extract using the encoder.DBM consists of arbitrary units for the whole network for getting or producing binary results.DBN has multiple layers that have a connection between them,not between units.Discriminative architecture has two types,which are recurrent neural network and convolutional neural network.RNN is used in sequential data,and in most cases,it is used for natural language processing [16].This work focuses on CNN as it includes multiple layers that can classify small variations of data features of various class labels,such as legitimate and normal behaviors.

2.2 Internet of Things(IoT)

The Internet of Things (IoT) offers connecting devices and applications to the Internet to sense and monitor systems [4,16].IoT is defined as the seamless connection of the information network and physical objects,named ‘smart objects,’ with these objects being active users in business processes,being accessed through network services,along with considering security and privacy in mind [24].At the end of the 20thcentury,the Internet started to spread through web services.It was imaginable that objects like a pen or book that would automatically work itself and write directly.The development of IoT spreads worldwide through mobile devices,laptops,and workstations [3].The creation of new IoT products would minimize the computer and new approaches linked with wireless networks [25].Nowadays,IoT sensors connect to the Internet,such as devices that carry IP cameras.The IoT devices usually are not expensive and easy to deploy in IoT networks,such as the deployment of temperature and light bulb sensors [26].

Research studies have emphasized that security in the IoT concentrates on attack detection,authorization,authentication,and access control [26,27].Many aspects affect the change of the traffic pattern while recognizing abnormal behaviors from IoT networks.It is vital to consider various aspects while developing IDS techniques for IoT networks at the edge,such as inspecting network protocols [28],determining application services [29],and identifying abnormal patterns at the edge [30].Existing IDSs have led to evolve and improve deep learning,statistical learning,and machine learning systems to classify massive data by analyzing the threats of IoT networks [11,31].

2.3 Related Work

Several IDSs have been proposed in the literature to identify cyber-attacks from network systems.For instance,Sadek et al.[32]proposed a new hybrid IDS approach using an indicator variable-enabled rough set technique for feature reduction and neural networks for classification.The empirical results revealed that the hybrid approach could achieve a 96.7% accuracy and a 3% false alarm rate using the NSL-KDD dataset,with lower computational resources than other compelling IDSs.The authors in [33]suggested a hybrid IDS based on the triangle area based nearest neighbors (TANN).The k-mean algorithm was used to cluster centers of attack classes,and KNN was used for classifying attack events.This experiment showed high accuracy and a low false alarm rate on the KDD-Cup 99 dataset.

Moustafa et al.[5]proposed a new approach called (ODM-ADS) that detects attacks,where a new profile was designed to model normal events and detect attacks differently based on an outlier function.This approach would be deployed at IoT and cloud and fog computing,and it accomplished high performances compared with other techniques using the NSL-KDD dataset and UNSW-NB15 datasets.Essam et al.[34]proposed a hybrid algorithm based on correlation feature selection and information gain to reduce the number of features.This research applied to the NSL-KDD dataset;the reduced dataset was validated by a naive Bayes classifier using the adaptive boosting technique.A study by Alom et al.[35]used DBN to perform an intrusion detection system for detecting unknown attacks.Karimi et al.[36]developed a feature selection technique using information gain and symmetric uncertainty model to select the relevant features and naïve Bayes for classifying attacks.The outputs showed that the proposed techniques performed more than machine learning-based IDSs.

Tang et al.[37]developed an intrusion detection model using a deep forward network that contains three hidden layers.The model used the best six features selected from the NSL-KDD dataset.Ling et al.[38]applied a convolution neural network technique for IDS that detect attacks.Niyaz et al.[39]used the auto-encoder to get feature representation then classify the data using the soft-max regression using the NSL-dataset.Hodo et al.[40]proposed a new approach of an artificial neural network to detect DoS and DDoS attacks with obtaining good accuracy in IoT systems.Chen et al.[41]also tried to detect DDoS for IoT networks.Haddadi et al.[42]used two hidden layers of the neural network using the DARPA1999 data to overcome the problem of overfitting and detect suspicious events.Amma et al.[43]proposed a new in-depth radial approach to optimize the depth of the neural network parameters applied to different datasets to detect DoS attacks.

Recently,Moustafa et al.[1]reviewed existing IDSs and their methods and problems in network and edge systems.The authors demonstrated that the main challenge of IDSs is that existing IDS approaches cannot discover new families from large-scale and heterogeneous data sources collected from IoT networks.It was recommended that deep learning techniques improve the performance of reliable intrusion detection systems for obtaining high detection accuracy and low false alarm rates [1,16].Therefore,this study’s primary goal is to discover new attack families from heterogeneous data sources collected from the edge of a network.Deep learning is used in this work as it has the ability of the feature extracting,analyzing in deep,and detecting suspicious vectors.

3 Proposed CNN-Enabled Intrusion Detection System

This section discusses the proposed Intrusion Detection System (IDS) that discovers cyberattacks from the edge of a network.The proposed system provides the ability to deal with the essential features of network flows.The proposed system includes three main components:data preprocessing,feature selection,and decision engine,as depicted in Fig.1.In data preprocessing,network data are filtered and processed by removing redundant values,converting data into a numeric format,and normalizing data to improve feature selection and decision engine stages.In feature selection,the information gain method is applied to select the essential features and enhance the accurate detection of the decision engine technique.In the decision engine,a few shot deep learning-based Convolution Neural Network (CNN) techniques are employed to classify anomalous behaviors.The three components of the proposed IDS are explained below.

Figure 1:Architecture of the proposed IDS system

3.1 Data Preprocessing Phase

In the data preprocessing phase,network data are filtered by converting non-numerical features to numerical values because the convolution neural network handles numbers.This conversion happens by converting categorical values in the datasets into numeric ones,such as protocol values in the dataset are converted into numerical values,for example,(TCP=1,UDP=2,ICMP=3).Redundant values in the datasets are also excluded to enhance the detection accuracy of Deep Learning.To overcome the imbalance in the datasets,the train and test data are divided into 20% testing data and 80% training data.The values of the feature datasets,such as UNSW-NB 15 and BoT-IoT datasets,are entirely different because the data have nominal,float,and timestamp values.Therefore,data features are normalized into a range of values,such as [0,1],to improve the decision engine’s performance.

3.2 Few Shot Learning Method for Intrusion Detection

Few-Shot Learning (FSL) can release new tasks that have only a few samples with supervised information.In other words,FSL is a new machine learning that is ready to learn from a limited number of examples with supervised information [44,45].FSL can help in the robotics field [46],which generates robots or machines that act like humans.Many fields need to use FSL,and the most important one is drug discovery,which finds out the properties of new molecules to generate a new drug [9]that will be useful for diseases.FSL is now considered a hot topic because it is based on a small number of samples,so many machine learning approaches have been proposed,such as embedding learning [47,48],meta-learning [49],and generative modeling [44,50].

3.2.1 Feature Selection-Based Information Gain(IG)

Information Gain (IG) is known as mutual information that indicates a training set of features vectors is most useful for discriminating between the classes to be learned and tries to find a subset of the original variable,which is calculated as Eq.(1).It is one of three feature selection strategies:filter,wrapper,and embedded approaches [16].It is used to improve the accuracy of the system or time for mining.The different researchers applied data preprocessing techniques,such as data cleaning,data integration,and dimensionality reduction based on feature reduction and feature selection.The entropy determines the value of the information and relation between each feature,estimated as Eq.(2).Feature selection is the way of searching for a solution to make a network more secure through reducing false alarm and time costs of IDSs during monitoring malicious activities on a network.

The objective of feature selection is to minimize the attribute.It led to making probability close to possible original distribution to all attributes.This process is done without more selection techniques employed to select relevant and information features or to select features that are useful to build a good predictor.Information gain is based on Shannon’s mathematical theory and communication and depends on entropy,which is a measure of unpredictability of information,and ranks the features that affect the data classification and pi is the probability of feature in the given set of features as shown in Eq.(3).

where

According to Maher and Ulrich (2012),IG handles only discrete values;therefore,it is essential to transfer continuous values into discrete values.Given the two random variables X and Y,I (X,Y) is the information gain of X concerning the class attribute Y.When Y and are discrete variable that takes values in {yi,...,yt} and {xi,...,xt}.With probability distribution function P(x);then the entropy of X is given by Eq.(4) or average information is expected value of I(x)over an instance of X by Eq.(5).Information I from the message X.Hence the IG for feature F on the dataset D in Eq.(6)

where value (F) is the set of all possible F values,Dattris the subset of D that has a value attr.H(D)=entropy of the class attribute.

Based on the information gain method,we select the most critical ten features from the network datasets to improve the decision engine technology’s performance that can discover cyber-attacks.

3.2.2 Convolution Neural Network(CNN)as Decision Engine

CNN is used as a decision engine of IDS that classifies legitimate and anomalous activities at the network’s edge.CNN may be a later type of neural network that works on to memorize and reach appropriate features for speaking to the input information.There are two contrasts with MLPs,which are weight sharing and pooling.CNN has numerous layers,and each layer comprises numerous convolution bits that are utilized to form distinctive outlines.Each locale of the neuron of a feature outline is connected to the following layer.All the spatial areas of the input share the bit for producing the included outline.One or different completely connected layers are utilized for the classification [13]after a few convolution and pooling layers.Since the utilization of shared weights in a Convolution Neural Network,the demonstration learns the same design is happening at distinctive positions of inputs without inquiring about memorizing isolated detectors for each position.For that,the architecture can control the interpretation of inputs [51].

The pooling layers minimize the computational obstacle since it diminishes the number of connections between convolutional layers.Be that as it may,pooling layers expanding the properties of interpretation and upgrading the open field of convolution layers.The activation function is used to solve non-linearity for convolution neural networks that help multi-layer detect nonlinear features.There are three types of activation function sigmoid,tanh and ReLU.One or numerous completely connected layers can be included after the stream of the network.To measure the blunders within the preparing portion,loss work can be utilized to check the mistakes [52].The CNN is adapted using the parameters listed in Tab.1 to establish a decision engine technique that can classify legitimate and attack events of datasets collected from the edge of networks.

Table 1:Adapted hyperparameters of CNN used as a decision engine

4 Experimental Results

4.1 Experimental Design

We used Google open-source data flow engine TensorFlow using the Python Keras package,which is named Google Colab [53],to implement the proposed IDS.Keras was used as the front-end API as it is the foremost critical library in an in-depth convolutional network study.It incorporates a model reinforcement to utilize it effectively and rapidly that runs utilizing CPU and GPU.

4.2 Datasets Used

To validate the proposed system for different types of attacks and different network infrastructure and characteristics,testing and evaluation was carried out on two different network datasets of UNSW-NB15 [14]and BoT-IoT [15,54].First,the UNSW-NB15 [14]is a new data set published in 2015 from The UNSW Canberra Cyber to evaluate intrusion detection purposes.The UNSW-NB15 is divided into a training set and testing set containing 175,341 records and testing 82,332 records.The UNSW-NB15 used the IXIA Perfect Storm tool to establish mixed regular and modern attacks of network traffic.The UNSW-NB15 includes nine attack families,as demonstrated in Tab.2.

Second,the Bot-IoT dataset was designed from a real network environment and was built in the cyber range lab of UNSW Canberra to be used for creating.There are combinations between normal and malicious traffic in the environment.The source files of the datasets are given with different formats that contain CSV files,PCAP files,and argue files.The files will be clustered based on the attack category and subcategory to get better support in the labeling process.The PCAP files are 69.3 GB,with more than 72.000.000 records.The size of the extracted traffic is 16.7 GB.MySQL queries are used in the botnet dataset for extracting 5% of the original dataset to ease the usage of the dataset.The extracted 5% consists of 4 files 1.07 GB in size,and 3 million records.The attack types of the Bot-IoT dataset are described in Tab.3.

Table 2:Attack types of UNSW-NB15 dataset

Table 3:Attack types of Bot-IoT dataset

4.3 Feature Selection Using Information Gain

The ten crucial features are selected using the Information Gain technique from the UNSWNB15 and BoT-IoT datasets,as listed in Tabs.4 and 5.These features are used as the input of applying CNN as a decision engine to classify normal and attack activities.They significantly impact the performance of the decision engine by improving the detection accuracy and processing time.

Table 4:Best ten features from the UNSW-NB15 dataset

Table 5:Best ten features from the Bot-IoT dataset

4.4 Results of CNN Compared with Other IDSs

The proposed CNN-IDS model was trained using the two datasets of UNSW-NB15 and Bot-IoT.This phase of training to guarantee that parameters dependable for affecting in the testing phase.The evaluation of the CNN intrusion detection system was processed on the ten selected features of datasets listed in Tabs.4 and 5.Using the UNSW-NB15 dataset,the overall Detection Rate (DR) and False Positive Rate (FPR) of the CNN-IDS are represented in Fig.2.In this figure,the Receiver Operating Characteristics (ROC) curves which show the relation between the detection rates and false rates,are depicted.The outcomes demonstrated the proposed system could detect different attack types in an average of 91% on the UNSW-NB15 dataset.The results of CNN-IDS system is compared with four existing intrusion detection techniques,that are named the Triangle Area Nearest Neighbors (TANN) [33],Euclidean Distance Map (EDM) [55]and Multivariate Correlation Analysis (MCA) [56],Outlier Dirichlet Mixture (ODM) [5].As shown in the figure,the system outperforms these techniques in terms of detection rate with about 2% and a false positive rate with roundly 1%-2%.

Figure 2:ROC curve of CNN-IDS compared with other techniques on the UNSW-NB15 dataset

The proposed CNN-IDS system also can correctly classify and discover various attack types using the BoT-IoT dataset,as presented in Fig.3.The proposed system can detect all the attack types in around a 99.9% detection rate and a 0.01% false-positive rate on the BoT-IoT dataset.The CNN-IDS system is also compared with the four techniques used in the UNSW-NB15 dataset.The outputs illustrated that the proposed system would detect attack types better than other models with about a 3% detection rate and around a 3%-4% false-positive rate.When comparing the results on both datasets,it is obvious that the proposed CNN-IDS achieves better performance with about 8% detection rate using the BoT-IoT dataset that is higher than the UNSW-NB15.This is because the BoT-IoT has new attack types with high variations between the normal and attack classes,enabling the CNN-IDS system to train the normal and attack data better than the UNSW-NB15 dataset.

Figure 3:ROC curve of CNN-IDS compared with other techniques on the BoT-IoT dataset

To sum up,the proposed CNN-IDS system achieves higher detection accuracy than the other four IDS mechanisms because of its potential design using the Information gain and CNN models.The Information Gain assisted in selecting the most important features in both datasets,while the CNN architecture [57]was designed to have multi-dense layers that can identify small variations between the normal and abnormal events from the datasets.Therefore,the proposed system can be used as a proper IDS solution that identifies and alerts attack activities at the edge of networks.

5 Conclusion

This paper has presented a new IDS,so-called CNN-IDS,based on a few shots learning.The proposed CNN-IDS has been developed to discover new attack events from the edge of a network.The proposed system includes two models of feature selection and decision engine.The feature selection model was developed by the Information Gain method to select essential features from network data,while the decision engine was developed using a one-dimensional Convolutional Neural Network (CNN) algorithm to discover attack events.The proposed system was trained and tested using two datasets of the UNSW-NB15 and Bot-IoT.The results showed that the proposed system outperforms several peer intrusion detection systems.This demonstrates the capability of applying the proposed system at real IoT networks and safeguards them against new cyber threats.This work will be extended by developing new federated IDS that can concurrently discover attacks from IoT services and their network traffic.

Funding Statement:This work has been supported by the Australian Research Data Common(ARDC),project code-RG192500.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.