Wang Minghua
(National Computer Network Emergency Response Technical Team/Coordination Center
(CNCERT/CC), Beijing 100029, China)
Abstrac t:Network traffic anomalies refer to the traffic changed abnormally and obviously.Local events such as temporary network congestion,Distributed Denial of Service(DDoS)attack and large-scale scan,or global events such as abnormal network routing,can cause network anomalies.Network anomaly detection and analysis are very important to Computer Security Incident Response Teams(CSIRT).But wide-scale traffic anomaly detection requires extracting anomalous modes from large amounts of high-dimensional noise-rich data,and interpreting the modes;so,it is very difficult.This paper proposes a general method based on Principle Component Analysis(PCA)to analyze network anomalies.This method divides the traffic matrix into normal and anomalous subspaces,maps traffic vectors into the normal subspace,gets the distance from detected vector to average normal vector,and detects anomalies based on that distance.
T he public Internet's role in all aspects of social life is becoming more and more important.Security risks caused by the Internet's openness,and application system complexity increase accordingly.According to The Computer Network Emergency Response Technical Team/Coordination Center of China(CNCERT/CC)Report,CNCERT/CCreceived a total of 26 476 non-scanning network security event reports in 2006,which was two times more than that in 2005 and exceeded the total from 2003 to 2005.In 2006,the CNCERT/CCdetected,through sample monitoring using the deployed 863-917 Network Security Monitoring Platform,that 45 000 hosts with IP addresses were infected with Trojan,which was twice of that in the same period in 2005.Over 10 million hosts with IPaddresses were infected with the Botnet software and were controlled by 16 000 hosts overseas.
Hackers controlled millions of infected computers using Trojan and Botnet,released malicious codes,sent spam mails,and launched Distributed Denial of Service(DDoS)attacks,which posed great threats to the entire Internet network including the backbone network.DDoSattacks launched by tens of thousands of computers could exhaust the bandwidth of a Metropolitan Area Network(MAN),or even a backbone network,which could result to a partial Internet breakdown.Since the services of important information systems,such as government,finance,securities,energy and customs,are performed through the Internet,the collapse of Internet backbone networks may cause not only significant commercial losses,but also serious threats to national security.According to incomplete statistics,the economic loss caused by the Code Red worm spread out on July 19,2001 was estimated to be over$2 billion.The Nimda worm,spread out on September 18,2001,caused over$2.6 billion of economic loss.The SQL Slammer worm,spread out in January 2003,caused over$1.2 billion of economic loss.
This research presents a method for detecting wide-scale traffic anomalies that meets the requirement of macro network security for the Internet.The method can analyze traffic anomalies at the backbone network level.It allows quick and effective monitoring upon large-scale security events,thus allowing time for network defense.
In wide-scale traffic anomaly detection at the backbone network level,the real-time processing of large traffic and the detection of unknown attacks pose great challenges to the traditional intrusion detection technology.Domestic and international academic institutions and enterprises have explored and developed various methods for traffic anomaly detection[1].
A typical traffic monitoring method is the threshold baseline-based detection.This method establishes a range of normal reference baselines through history data analysis.Once the range is exceeded,it is determined as being abnormal.This method features simplicity and low computation complexity,suitable for real-time detection.However,as practical detection means,it should be modified and improved according to the characteristics of the network traffic.
Another common method is the statistics-based detection,that is,Generalized Likelihood Ratio(GLR)[2].It considers two adjacent time windows and the combined window of the two windows.Each window is fitted using an auto-regressive model.This method calculates the Joint Likelihood Ratio(JLR)of the serial residual of each window,and then compares it with a preset threshold,T.When threshold Tis exceeded,the window boundary is regarded as the abnormal point.This method is effective to detect sudden traffic changes,but since its threshold is not selected automatically,the method will be partially ineffective when the abnormalduration exceeds the window length.The research of statistics models has an extensive prospect in terms of traffic anomaly detection.Different statistical modeling establishes different detection methods.
Many researchers have explored transform domain-based traffic anomaly detection methods in recent years[3].This method generally transforms traffic signals from time domain to frequency domain or wavelet domain,and then monitors anomaly according to the space features after the transformation.P.Barford et al[4]apply the wavelet analysis theory to the traffic anomaly detection and defined four types of anomalous results based on the theory.But the method requires too complex calculations;so,it is not suitable for the real-time detection on high-speed backbone networks.
Lakhina et al[5-6]apply Principal Component Analysis(PCA)to separate and project the high-dimensional structure space of the data streams between target and destination to three principal components.It reconstructs the features of network traffic using three new composite variables.There are other monitoring methods[7],for example,the Markov model-based network state transition probability detection method.
This method defines each type of event as a system status,and describes the predicted normal network features through a status transition model.An alarm will be raised when the coming traffic is different from the expected ones.
Another example is Learning Rules for Anomaly Detection(LERAD)[8],which is based on network security features.This method learns to get the normal correlation rules between traffic attributes,and then establishes a normal rule set.It performs rule matching in traffic detection,and sends alarms to the traffic violating the rules.This method can locate the address where an anomaly occurs and quantize the magnitude of the anomaly.However,this method requires lots of pure data in normal mode when it learns,which is difficult to achieve in real networks.
As the wide-scale network traffic anomaly detection becomes a hot network security technology,some vendors launch telecom-class traffic anomaly detection products such as Arbor Peakflow,GenieNRM,GenieNTG 2100 and NetScout nGenius.
Some research project about wide-scale network traffic anomaly monitoring,such as the System for Internet-Level Knowledge(SiLK)and Automated Incident Reporting(AirCERT),built by the Computer Emergency Readiness Team(CERT),and the NMAC traffic monitoring system of Australia,funded by their government,have made good achievements.
CNCERT/CC has established the 863-917 Network Security Monitoring Platform to meet the requirement of wide-scale traffic anomaly detection.
The platform is designed to be in a distributed architecture,and can monitor backbone network traffic via multiple points.It monitors the macro running status of the Internet in China by analyzing information such as protocol,address,port,packet length,traffic flow and time sequence.Our research collects traffic information based on the 863-917 Network Security Monitoring Platform to establish a monitoring matrix.
Row vectors of the matrix include fourteen components:
·Number of source addresses
·Number of destination addresses
·Number of Transmission Control Protocol(TCP)bytes
·Number of TCPmessages
·Number of User Datagram Protocol(UDP)bytes
·Number of UDPmessages
·Number of miscellaneous traffic bytes
·Number of miscellaneous traffic messages
·Number of web traffic bytes
·Number of web traffic messages
·Proportion of Top 10 source IP addresses to the total number of bytes
·Proportion of Top 10 source IP addresses to the total number of messages
·Proportion of Top 10 destination IP addresses to the total number of bytes
·Proportion of Top 10 destination IP addresses to the total number of messages
The system generates a row vector every five minutes,and it uses a 6-hour monitoring as a time window;thus,it forms a 72×14 matrix in one time window.There is certain correlation between the fourteen monitoring vectors,which makes it possible to reflect the information of the original variables using fewer variables.This project decreases the dimensions of the monitoring data and extracts features through PCA.The following sections introduce the algorithm principle.
PCAis a coordinate transformation method that maps out a given set of data points to new axes.These axes are called principal components.Algebraically,primary components are a series of linear combinations of p random variables,X1,X2…and Xp. In geometry,these linear combinations mean to select a new coordinate system,which is achieved by rotating the previous coordinate system whose axes are X1,X2…and Xp.The new coordinate axes indicate the direction of the maximum data variance,and provide a simple but more concise depiction of the result of the covariance.
Principal components are only dependent on the covariance matrix of X1,X2,…and Xp.It explains the
covariance structure of these variables using linear combinations of a group of variables,which is generally used for the interpretation of high-dimensionaldata and data compression.
Generally p components can reproduce the total system variability;however,most of this variability can be accounted for by a small number k of the principal components.In this case,the k components contain almost as much information as the original p variables;so,the k principal components can be used to replace the original p variables.Moreover,the original data set composed of the n measurement results of the p variables can be compressed into n measurement results of the k principal components and used for analysis.
The PCA method often exposes previously unexpected relationships so it can offer unusual data interpretations.When working with zero-mean data,each principalcomponent points to the direction of maximum variance.The principal axes are in order of variance size.One principal component captures the direction of the maximum variance on an axis.Another principal component captures another variance in the orthogonal direction.
Let the random vector X'=[X1,X2,…Xp]have covariance matrix∑,with eigenvaluesλ1≥λ2…λp≥0.Consider the linear combinations:
Then:
Those unrelated linear components of Y are principal components,which make variances as greater as possible.The first principal component is the linear component with the maximum variance;it maximizes Var (Yi)=a'i∑ai.Paying attention only to the coefficient vectors with unit length,the following are defined:
·The first principal component=linear component a'1X.When a'1a1=1,it maximizes Var(a'1X).
·The second principal component=linear component a'2X.When a'2a2=1 and Cov(a'1X,a'2X)=0,it maximizes Var(a'2X).
·The i-th principal component=linear component a'1X.When a'iai=1 and Cov(a'iX,a'kX)=0(k<i),it maximizes Var(a'iX).
The above shows none of the principal components is related to another.Their variances equal to the eigenvalues of the covariance matrix.The proportion of the k-th principal components(interpreted by k-th principal component)in the total variance is:
If the first,second or the first few components account for a large part of the total variance,and the p has a great value;the leading principal component can interpret the previous data matrix,replacing the previous p variables,with few information lost.In this project,the PCA analysis to a matrix that contains fourteen vectors shows that the maximum variation of the features can generally be captured by 2-3 principal components.Such a steep-drop characteristic of the principal component variation curve provides a basis for dividing normal and abnormalsubspaces.
The traffic anomaly detection process in this project is divided into three stages:modeling,detection,and evaluation.The following sections discuss the algorithm for each stage in details.
This project creates the model using a slide time window.It uses 72 samples before the current time as the modeling space.These sample data form data matrix X.In the experiment,the row vector of the matrix is composed of 14 elements.
The principal components are divided into normal and abnormal components,representing normal and abnormal traffic in the network,respectively.The difference between the two types of components is the variation trend.Normal principal components vary moderately with an obvious periodicity as time elapses.Anomalous principal components vary greatly with a strong burst as time elapses.According to the sample data,the algorithm for determining normalprincipal components is as follows:
·Calculate the variables of the first principal component using the principal component and sample data.
·Evaluate the mean value(μ1)and variance(σ1)of the 72 values of the variables.
·Find the element that deviates from the mean value to the greatest extent from the variables.
·Determine whether the extent has exceeded 3σ1.If the maximum deviation has exceeded the threshold,take the first principal component as the normal component.All the other principal components are regarded as abnormal principal components.Take the principal component conversion matrix U=[L1].
·If the maximum deviation has not exceeded the threshold,proceed to the determination of the next principal component.Finally,get U=[L1…Li-1].The first principal component has high periodicity.The periodicity of the subsequent principal components decreases gradually while the suddenness increases gradually,which also reflects the difference between normal and abnormal network traffic.
After getting the value of the principal component conversion matrix U,project the principal components of each sample data Sk=xk1,xk2…xkp)to the p-dimension space for reconstruction.The vector after reconstruction is:
Calculate the Euclidian distance between the vectors before and after the sample data reconstruction.This distance is called variance.The formula for variance is:
Dk=||Sk-Tk||
Calculate the variance of the sample data 72 times,and then evaluate their mean(μd)and standard deviation(σd).Conversion matrix(U),variance mean(μd)and variance standard deviation(σd)are used to establish a network traffic model.They are also the prerequisites for traffic anomaly detection.
After establishing a network traffic model through modeling,centralize the new monitoring vectors N,(n1,n1…np)using a similar analysis procedure to that of the modeling stage:
▲Figure 1. Time sequence of the number of bytes.
▲Figure 2. Time sequence of the numbers of source and destination IP addresses.
▲Figure 3. Time sequence of proportions of traffic on the Top 10 IP addresses ranked by the number of bytes.
▲Figure 4. Time sequence of proportions of messages on the Top 10 IP addresses ranked by the number of messages.
Then project the centralized vectors to the p-dimension space for reconstruction and calculate the variance using the following:
If the monitoring value is normal,the vectors before and after the reconstruction should be very similar to each other and the variance d should be very low.If the monitoring value representing traffic has changed greatly compared to that in the modeling stage,the variance value should be great.This project uses the following algorithm to quantize the variance:
The task of the evaluation stage is to determine whether the network traffic is in a normal status according to the quantized value q(d)of the current monitoring vectors.According to the experience value,if|q(d)|<5,the network is basically normal.If 5≤|q(d)|<10,the network is severely abnormal.
The traffic of the backbone network of Beijing Telecom was continuously monitored by using the 863-917 Network Security Monitoring Platform,and 6 hours of monitoring data were extracted.This paper provides time sequence curves in Figures 1-4.Figures 1-4 shows that it's hard to determine anomaly through any one of the curves separately.However,this algorithm can easily indicate the anomaly occurrence time.Figure 5 shows the calculation result using this algorithm and marks the anomaly occurrence time.The further analyses on the anomaly occurrence time was conducted by using the backtracking function of the 863-917 platform,and it was found that,on the anomaly occurrence time,a large-sized botnet launched large-scale Denial of Service(DoS)attacks to three IP addresses in the outside network.
This paper addresses a PCA-based method for dividing subspaces and analyzing and detecting abnormal network events.This method can accurately and quickly indicate the anomaly occurrence time point.It helps network security emergency response team to detect wide-scale network traffic anomaly,thus allowing time to quickly solve network anomaly.The experiment shows that the analysis matrix formed by the 14 features provides high recognition accuracy and high analysis efficiency.
▲Figure 5. Time sequence of the traffic anomaly measurement result.