Jinyin Chen
Abstract—Negative selection algorithm(NSA)is one of the classic artif i cial imm une algorithm w idely used in anomaly detection.However,there are still unsolved shortcom ings of NSA that lim it its further applications.For exam ple,the nonselfdetector generation eff i ciency is low;a large number of nonselfdetector is needed for precise detection;low detection rate w ith various app lication data sets.Aim ing at those problem s,a novel radius adaptive based on center-optim ized hybrid detector generation algorithm(RACO-HDG)is put forward.To our best know ledge,radius adaptive based on center optim ization is fi rst time analyzed and proposed as an eff i cient mechanism to im prove both detector generation and detection rate w ithout signif i cant com putation com p lexity.RACO-HDG works eff i ciently in three phases.At fi rst,a small number of self-detectors are generated,different from typical NSAs w ith a large number of self-sam p le are generated.Nonself-detectors w ill be generated from those initial small number of self-detectors to make hybrid detection of self-detectors and nonself-detectors possible.Second ly,w ithout any prior know ledge of the data sets or manual setting,the nonself-detector radius threshold is self-adaptive by optim izing the nonself-detector center and the generation mechanism.In this way,the number of abnormal detectors is decreased sharp ly,while the coverage area of the nonself-detector is increased otherw ise,leading to higher detection performances of RACOHDG.Finally,hybrid detection algorithm is proposed w ith both self-detectors and nonself-detectors work together to increase detection rate as expected.Abundant simulations and app lication results show that the proposed RACO-HDG has higher detection rate,lower false alarm rate and higher detection eff i ciency compared w ith other excellent algorithms.
T HE biological immune system[1],[2]is an important system for organisms to achieve immune response and immune function,which can identify and exclude antigenic foreign bodies.Artif i cial immune system(AIS)is a simulation of the biological immune system,w ith learning,memory and powerful information processing capabilities[3],[4].AIS mainly includes negative selection algorithm(NSA)[5],clonal selection algorithm(CSA)[6]and immune network model(INM)three classic algorithms[7].As we all know,AIS has a w ide range of applications,including anomalies and fault detection[8]−[10],classif i cation,pattern recognition and other optim ization issues[11]−[13].NSA is one of the most popular models in the various models of AIS research[14].
When fi rst NSA is designed,the radius of the nonselfdetector is a fi xed value at the initial time in NSA.The detection rate of the nonself-detector is relatively low due to the presence of the loopholes.Jiet al.[15]proposed a variablevariable negative selection algorithm(V-detector NSA).Vdetector NSA calculates the m inimum distance between the detector center and the self-sample as the generated the nonself-detector’s radius to realize variable radius for each nonself-detector.Thus the detection rate is improved sharply because lots of loopholes are covered again by variable radius nonself-detectors.A lthough the V-detector NSA reduces the area of the loopholes,it still only uses on type of detector:nonself-detector for detection.As a result,that the rest areas uncovered by nonself-detector are considered as normal for certain,which may lead to risk of lower detection rate.Gonget al.[16]proposed a further training negative selection algorithm(FT-NSA).For the fi rst time,FT-NSA adopts both nonself-detector and self-detector for detection.It uses the V-detector algorithm to generate nonself-detector,and then recognize the nonself-detector as a data set to generate selfdetector.Finally,both self-detector and the nonself-detector are used to detect the test sample.If the sample is neither included by the self-detector nor included by the nonselfdetector(i.e.,the loopholes),it is judged by the circum ference of the detector.Never the less,there are many other researches focus on improving NSAs[17]−[19].
In conclusion,most current NSAs have been encountered w ith several problems.
1)Hybrid detection is eff i cient proved by FT-NSA proposed by Gonget al.[16],however we still lack eff i cient algorithms to generate nonself-detector and self-detector.Too many self-samples are needed for nonself-detector generation.Most NSAs usually generate nonself-detectors based on selfsamples.However,w ith the self-sets increase,the nonselfdetectors generation process costs more time in the distance calculation phase.Although FT-NSA adopts hybrid detectors,self-detectors are generated based on the nonself-detector,whose coverage may be different from the self-set.
2)V-detector NSA provides us a possible way to realize self-adaptive radius for each nonself-detector.Is there a more eff i cient way of radius self-adaptive mechanism for nonselfdetector and self-detector to reduce rest loopholes?Most of current NSAs need to achieve optimal radius for detectors after a bunch of simulations.It is necessary to design a novel self-adaptive radius algorithm for both nonself-detector and self-detector.
3)With w ider applications,most NSAs need large size of detector sets,w ith longer detection time.Because most nonself-detectors have high overlap rate,and the nonselfdetector size is large,which makes the detection period of the detection phase costs longer running time.How to shorten the detection phase w ith hybrid detectors is still a challenge.
Aiming at above list problems of most NSASs,RACO-HDG is put forward to realize radius self-adaptive based on centeroptimized hybrid detector generation.
1)In RACO-HDG,phase generation is designed for selfdetector by using a small number of self-detector instead of a large number of anomaly detectors.
2)In order to achieve optimal radius for each detector,selfadapt radius threshold is put forward based on self-samples distribution.
3)For reducing size of nonself-detector,nonself-detectors are generated from far to near.
4)The experimental results show that RACO-HDG reduces the number of self-detector and anomaly detectors to 1/10,and increases the true negative rate and reducing the false alarm rate.
In traditional fi xed radius NSAs,the size of nonselfdetectors is quite large.In order to reduce the size of nonselfdetectors and to improve the detection rate,V-detector NSA[15]is proposed.N-detector NSA takes the distance from the sample to the nearest self sample as the radius of the nonself-detector.In this way nonself-detector fi ts the self set by reducing the loophole(i.e.,the abnormal area that uncovered by nonself-detectors),and increasing the detection accuracy.When the radius is large,the nonself-detector has relatively larger coverage area to reduce the abnormal detectors size accordingly.
The V-detector NSA estimates the actual coveragePby sampling the fi xed samples in the detector generation algorithm(1),so that target coverage can be achieved w ith a given probability.The generation detector is completed when the coverage is close enough to the expected value,i.e.,Z>Zα,Zcan calculate by(2).
whereZαis conf i dence interval.The size of sample isn>max((5/P,5/(1−P))),andPis the given probability.
FT-NSA adds a phase of further training to generate selfdetectors,replacing self-samples w ith Self-detectors,and reshapes a novel self-region.In this way,the self-region can be more w idely covered by fewer samples.
FT-NSA also improved the detection phase of the algorithm to improve the detection rate.In the detection phase,it is necessary to calculate the distance between the detection sample,the detector set and the collection of self-detectors,including those that are neither covered by the detector nor covered by the self-sample.If that happens,the probability w ill be detected.And the detection rate increased.It has been proved that the number of self-detectors is greatly reduced compared w ith the number of self-samples,so the eff i ciency of the detection phase is greatly improved.
RACO-HDG mainly includes self-detector generation,center-optim ized nonself-detector generation and hybrid detection of three main phases.Self-detector generation is mainly to increase the eff i ciency of the generation of abnormal detector and to make the detection of the hybrid detector feasible.Center-optimized nonself-detector generation is mainly used to reduce the number of nonself-detectors and increase the eff i ciency of the detection phase.The hybrid detection is to detect the detection samples by using the self-detector and the nonself-detector to increase the detection rate and reduce the false alarm rate.
At present,most of the NSAs generate abnormal detectors based on self-sample.However,w ith the number of selfsample increases,the nonself-detector generation process consumes lots more time in the distance calculations.Since there are too many self-samples,it is impossible to adopt all selfsample to decide whether the sample is normal or not.So only the nonself-detector is used for detection,which makes the samples in abnormal area covered by nonself-detector mistakenly judged as the normal samples.The detection rate of the simulations would be decreased and the false alarm rate increases accordingly.
Therefore,self-detector generation is crucial for NSA.A novel phase self-detector generation algorithm is put forward for the fi rst time to our best know ledge.An improved meanshift algorithm is presented to optim ize the center of the self-detector so that the center of the self-detector is in the higher density region.Thus the self-detector covers more selfsample and reduces the number of self-detectors.And then the characteristics of boundary w ill be adopted to allow the adjacent area of one direction which has no self-sample to calculate the radius of the self-detector.In this way,a small number of self-detectors w ill be generated instead of a large number of self-sample w ithout changing the self-area.
1)Main Idea:We supposeSrepresents self-set containingnself-samples,andsirepresents theith self-sample.S0represents a self-set containingkself-samples that is not covered by the self-detector.RACO-HDG fi rst random ly selects a selfsamples0j,j=1,2,3,...,kinS0asSc.Then calculate the mean-shift vectorM hof the sample according to the formula(3).And also calculate the offset centerCs=s0+M h.Because we can not directly considerCsas the center of the self detector,otherw ise we can only generate only a few self-detectors that can never cover all the self-sample.So we calculate the new offset centerCs0according to the(6)and(7).Calculate the distance between the offset center and the detector centerdis(Cs0,Sc),if the distance is less than the parameterεto end the iteration.End the iteration if the distance is less than threshold,otherw ise makeSc=Cs0and optim ize the detector center again.
whereSmis a self-sample set w ith an Euclidean distance less thanh,i.e.,satisfy(4).
Fig.1 is an illusion of how radius of self-detectors determ ined in RACO-HDG.Suppose yellow is for the abnormal area,white for the normal area,the blue points are for the self-sample,cyan is the self-detector that has been generated and magenta is the area covered bySm.
Then the surrounding area ofScis divided intoaself-layer,and each layer is divided intobdirections,as shown in Fig.2.Calculate the number of self-samplesPN umin each region and calculate the farthest Euclidean distancePD isis from the self-sample in each region.If there are two consecutive layers of self-layerjandj+1 in directioni,the number of self-samplesiis zero,and the region is the boundary area.
We consider the area to be a border area.If the boundary area does not exist,the radius of the self-detector isSr=(a−1)×Rm.If the boundary region exists andj=1,the selfdetector radius isSr=Rm.If there is a nearest boundary area in the self-layerjof directioni,the self-detector radius isSr=PD is(i,j−1).Fig.3 shows the result of the selfdetector generation.Green show the nearest two consecutive zero regions.Red is the generated self-detector.
Generate self-detector,Scis the detector center,Sris the radius.And remove the self-samples0jthat are covered by the self-detector,i.e.,dis(s0i,Cs).Then proceed to the next iteration untilS0is empty.The experimental results are shown in Fig.4.From the fi gure we can see that the phase selfdetector generation does achieve the purpose of using a small number of self-detectors instead of a large number of selfsample w ithout changing the self-sample point coverage area.
Fig.1.The optim ization process of self-detector center.
Fig.2.10 self-layer and 20 direction.
Fig.3.Self-detector generation.
Fig.4.Self-detector generated results.
2)Flowcharts and Step:Fig.5 is the phase self detector generation fl owchart.The main steps are as follows:
Fig.5.Phase self-detector generation fl ow chart.
Step 1:Self-detector center optimization
1)Random selection of self pointss0i∈S0;
2)Calculate the MeanShift vectorM hand offset centerCs;
3)Adjust the offset center toCs0;
4)Ifdis(Cs0,Sc)is greater thanε,letSc=Cs0and go back to step(2).
Step 2:Self-detector generation
1)Divide the surrounding area ofScintoaself-layers,and each layer is divided intobdirections;
2)CalculatePN umandPD is;
3)From the fi rst layer to determ ine whether the existence of boundaries.If it is present,proceed to the next step;
4)Calculate the radius of the self-detector and generate a self-detector;
5)Remove the samples0jthat covered by the self-detector;
6)IfS0is not an empty set,skip back to Step 1.
1)Main Idea:Since we do not know the distribution area of the self-detector and the degree of density,there should be different nonself-detector radius for different data sets.If the nonself-detector radius is too small,the nonself-detector w ill run inside the self-set,and if the detector radius is too large,the detection rate of the nonself-detector w ill be reduced.So every time before the program is used,people w ill run the program many times to choose the best self-radius.This w ill consume a lot of time,making the program universally poor.
In this paper,we calculate the radius threshold of nonselfdetector by calculating the density of some self-samples.
2)Flowcharts and Steps:Fig.6 is the fl ow chart of adaptive detector radius threshold.The main algorithm steps are:
Fig.6.The fl ow chart of adaptive detector radius threshold algorithm.
Step 1:According to the number of self-samples,select a part of the self as a detector sample number.
Step 2:Set the nonself-detector minimum radiusRmin,andRminmust be greater than the actual radius of the smallest possible radius.If the parameter is too large,the program run time w ill increase in small increments,but little effect on the results.
Step 3:Statistics number of self-sample in each selfsample’sRmin,the density of self-sample,namely the density.
Step 4:To exclude the special circumstances on the experiment,calculating the average density of the m iddle third of the self-sample points and sort them.
Step 5:If the number of self-sample is larger thanM,the radius is reduced to the original 0.8,until the average number to meet the requirements.
Fig.7.Nonself-detector generated.
Both V-detector NSA and FT-NSA,a lot of nonself-detector are generated w ith a higher overlap rate.However,in the detection phase,the number of nonself-detectors determines the detection eff i ciency of the detector,so we need to use as few nonself-detectors as possible to detect as many abnormal samples as possible.If you want to reduce the number of nonself-detector,you need to increase the coverage of each nonself-detector,reduce the detector overlap rate.However,if you want to reduce the false alarm rate of the detector,you need to generate a smaller radius nonself-detector at the edge of the self-detector.
Therefore,nonself-detectors in RACO-HDG are generated from far to near.First we generate larger radius of the nonselfdetector as far as possible from the uncovered area of the selfdetector,and then random ly generate a nonself-detector w ith a smaller radius threshold.RACO-HDG improves the coverage of the nonself-detector while reducing the number of nonselfdetectors.
1)Main Idea:Nonself-detector is initially generated at the data set boundary.And then random ly generate a sample pointTrc1.If the point is covered by a self-detector or is covered by an existing nonself-detector,then delete it.Otherw ise calculate the minimum distanceD r1 from the point to the edge of the self-detector,ifD r1 is less than the radius thresholdD li,delete it.Then look for the nearest two nonself-detectorsD d1,D d2 from the pointTrc1.And then generate two new sample pointsTrc2,Trc3 at the intersection of the nonself-detector andTrc1,if the new sample is covered by the self detector,delete it.Otherw ise calculate the m inimum distanceD r2,D r3 from the point to the edge of the self-detector.As shown in Fig.7(a).The detector w ith the largest radius is stored in the queue as a candidate detector.If the number of candidate detectors is greater than the parameterCandinum,an excess candidate-detector is added to the nonself-detector,until the cycle stop condition is reached.The experimental results are shown in Fig.7(b).Cyan is the self-detector,and blue is the nonself-detector generated by this step.
And then make the nonself-detector radius threshold is less than the original threshold,random ly generate a sampleTrc4.If covered by one nonself-detector or self-detector,delete it.Otherw ise use the minimum distance of the point to the selfdetector edgeD d4 as a radius to generate a nonself-detector,until the cycle stop condition is reached.The experimental results are shown in Fig.7(c).Red is the exception detector generated by this step.
2)Flowcharts and Steps:Fig.8 show the fl owchart of nonself-detector generate.The main steps as follow:
Step 1:Generate an exception detector at the feature endpoint.
Step 2:Random ly generate a sampleTrc1 if the point is contained by a self detector or an existing nonself-detector,or its radius is less thanD li,go to Step 2.
Step 3:Generate the detector in centerTrc1.
Step 4:And then generate two new sample pointsTrc2,Trc3 at the intersection of the nonself-detector andTrc1.
Step 5:The detector w ith the largest radius is stored in the candidate detector.
Step 6:If the number of candidate detectors is greater than the parameterCandinum,an excess candidate-detector is added to the nonself-detector.
Step 7:If the stop condition is reached,if not go to Step 2.
Step 8:Decrease the nonself-detector radius threshold and random ly generate the nonself-detector until the stop condition is reached.
The experimental operating system is Windows 7,the integrated development environment is MATLAB 2013a.The hardware conditions using Intel Core I5 2.6GHz CPU,4GB memory.
Def i nition 1:TPrepresents the probability of nonselfdetector correctly detecting an abnormal sample.If we havemnegative samples,andnis the number of abnormal sample that correctly detected by nonself-detector,we have
Fig.8.The fl owchart of negative detector generated algorithm.
In order to analyze the behavior of the algorithm described in the previous section,experiments were carried out using 2-dimensional synthetic data.Over the unit square[0,1]2,various shapes are used as the“real”self-regions in these experiments.In this literature,we w ill use six basic shape of self-region,as shown in Fig.9.Data set one is a rectangle,on behalf of simple regular data set.Data set two is a cross,on behalf of data set whose distribution is scattered and selfset distributed in four directions.Data set three is a rectangle,represent the data set that concentrated in the middle.Data set four is a ring,represent the data set that abnormal set is in the center of normal set.Data set fi ve is a circle m inus a cross,represent the data set that abnormal set and normal set cross together.Data set six is a pentagram,represent the irregular distribution of the date set,there are a lot of projections and recesses.
In order to make the experimental results more accurate,take the average of 50 experiments as experimental results.
Supposeza=1.8,testnum=1000,selfradius=0.02,p=0.99,change the number of self-sample.Fig.10 shows the distribution of different numbers of self-samples in the data set.Fig.11 shows the experiment results for different shapes.
Fig.9.Different types of shape.
Fig.10.Inf l uence of coverage.
From the fi gure we can see that the detection rate w ill decrease as the number of self-samples increases,and w ill not be affected by the algorithm and shape.This is because w ith the increase in the number of self-samples,the number of self-samples on the edge of the sample w ill also increase,making the anomalous area close to the self-sample boundary misjudged as normal.In general,the algorithm RACO-HDG has the best detection rate,FT-NSA and V-detector NSA detection rate is very sim ilar,FT-NSA slightly better than the V-detector.However,the RACO-HDG’s false alarm rate is slightly larger than FT-NSA and V-detector,because RACOHDG generates a small radius anomaly detector,making it more fi t w ith the self.So we use theERas the fi nal criteria,RACO-HDG has the best experimental results.It is worth mentioning that,A greatly reduced the number of abnormal detectors,increasing the detection eff i ciency.
Supposeza=1.8,testnum=1000,selfnum=3000,selfradius=0.02,then change the coverage of self-samplep.Fig.12 shows the experiment results.
Fig.11.Inf l uence of self-sample number.
From the fi gure we can fi nd that the experimental detection rate w ill increase w ith the increase ofp,false alarm rate changes in a small range.The error rate of V-detector and FTNSA algorithm w ill decrease obviously w ith the increase ofp,and the error rate of RACO-HDG algorithm is gradually decreased in a relatively gentle trend.The reason for the change of V-detector and FT-NSA is that the number of nonself-detectors increases exponentially w ith the increase ofp,and the coverage of the nonself-detector increases.So the detection rate increases and the error rate decreases.It is worth noting that although the number of V-detectors and FT-NSA are much higher than that of TACO-HDG,the overlap rate of V-detector and FT-NSA are also high.So the detection rate of V-detector and FT-NSA are still smaller than TACO-HDG.Experiments show that RACO-HDG algorithm can achieve good performance in any case ofp,the bigger thep,the better the performance.
Fig.12.Inf l uence of coverage.
Because of the false alarm rate may rice as detection rate,the experimental results cannot be ref l ected by detection rate or false alarm rate.So we try to use m isjudgment rate to show the experimental results.Fig.13 shows the experimental results of different algorithms.Box plot consists of fi ve fi gures:the maximum,the upper quartile,the average value,the lowerquartile,m inimum.
Fig.13.The experimental results of different algorithm.
We can see from the experiment result,on the account of FT-NSA on the basic of V-detector generate the self detector to deal with the loophole on the edge of the self and nonselfset.This makes the detector rate rise,but parts of selfsamples became loophole as the generate self-detector is not necessarily completely covered the original sample.So the m isjudgment rate w ill rise at same time.To deal w ith this problem,RACO-HDG methods generate self-detector according to the edge of the self-set generated self-detector.This makes the detection rate of self-detector has obviously increased.It is worth mentioning that because of self-detector according to the edge detector to generate itself,part of the edge w ill not contain by self-detector,makes experiment false rate fl uctuation is more obvious.A lthough there are such problems,but the m isjudgment rate is the lowest,signif i cantly improve the performance of the detector.
From the experiment,we can fi nd that different design for the experiment of false alarm rate effect is more obvious.When the self-set distribution is concentrated such as shape3,shape6.Detector is low false alarm rate.When the self-set distribution area is larger,such as shape1,shape2,the false alarm rate is obviously increasing.When blended w ith fi ne of self centered not self-detector,the m isjudgment of RACOHDG is proper,but the false alarm rate is not very satisfactory.
A lot of experiments w ith multidimensional data sets are used to verify the applicability of far to near algorithms.One of the three types of Iris data is detected as normal data,while the other two are regarded as abnormal data.The original data is evenly divided into training sets and test sets.The self-detectors w ith phases cannot be generated because of small number of samples.And fi xed-radius self-detectors and nonself-detector radius thresholds are applied to eliminate the interference of phase algorithms.
The comparison of famous benchmark Fisher’s Iris Data is shown in Tables I−VI.And parameter settings of each methodare the same,which is results are the average values of 100 repeated tests.The system is trained by normal data from Iris data set w ith completely or partially.I-detector and selfdetector have the highest detection rate and false alarm rate.However,their testing algorithm shows that they may belong to positive selection algorithm(PSA)rather than NSA.When we only use fi fty percent of normal data,the false alarm rate w ill become higher than baseline.A lthough the NSA have the high detection rate and low false alarm rate,the number of detector is too much.And it is easy to fi gure out that RACOHDG have higher detection rate w ith lower false alarm rate.Moreover,the number of detectors is small.In summary,the applicability of far to near algorithms has good stability and applicability.
TABLE I THE EXPERIMENTAL RESULTS OF SETOSA 100%FOR TRAIN ING DATE
TABLE II THE EXPERIMENTAL RESULTS OF SETOSA 50%FOR TRAIN ING DATE
TABLE III THE EXPERIMENTAL RESULTS OF VERSICOLOR 100%FOR TRAIN ING DATE
TABLE IV THE EXPERIMENTAL RESULTS OF VERSICOLOR 50%FOR TRAIN ING DATE
TABLE V THE EXPERIMENTAL RESULTS OF V IFIN ICA 100%FOR TRAIN ING DATE
TABLE VI THE EXPERIMENTAL RESULTS OF VIFINICA 50%FOR TRAIN ING DATE
This paper has introduced a novel radius adaptive based on center-optim ized hybrid detector generation algorithm.The algorithm fi rst optim izes the self-detector center and generates self-detectors.,reduces the number of self-detectors,increasing the eff i ciency of the nonself-detector generation.And then optimize the center of the nonself-detector to generate a nonselfdetector,reducing the number of self-detectors and increasing the eff i ciency of the detection phase.The experimental results show that the algorithm obviously increases the detection rate.However,due to the generation of a small radius nonselfdetector,when the number of self-sample is small,there may be over-f i tting phenomenon.But the RACO-HDG has the smallest misjudgment and the m inimum number of detectors throughout the test.So the algorithm is effective.
IEEE/CAA Journal of Automatica Sinica2020年6期