An Information-Based Elite-Guided Evolutionary Algorithm for Multi-Objective Feature Selection

2024-01-27 06:49ZiqianWangShangceGaoZhenyuLeiandMasaakiOmura
IEEE/CAA Journal of Automatica Sinica 2024年1期

Ziqian Wang , Shangce Gao ,,,Zhenyu Lei , and Masaaki Omura

Dear Editor,

This letter is concerned with the evolution strategy for addressing multi-objective feature selection problems in classification.Previous methods suffer from limitations such as being trapped in local optima and lacking stability.To overcome them, we propose a novel eliteguided mechanism based on information theory.Firstly, an elite solution is generated through a dimension reduction strategy and incorporated to the initialization population.Then, a symmetrical uncertainty-based mutation operator is developed to implement local search after the crossover operator.Finally, a special crowding distance is utilized to analyze duplicates in the environmental selection.The effectiveness and superiority of the proposed method are verified on 20 datasets, including high-dimensional ones.

Introduction: With the continuous advancements in data collection technologies, the dimensionality of datasets has significantly increased.High-dimensional data often contains irrelevant and redundant features, leading to performance degradation in classification tasks due to the “curse of dimensionality”.Consequently, effective data mining techniques are essential to address this challenge.

Feature selection (FS) is a widely adopted data mining technique that tackles this issue by selecting a subset of relevant features.FS methods can be broadly categorized into two main types: filter and wrapper-based methods [1].Filter-based methods evaluate the importance of features using specific criteria and select top-nfeatures accordingly.Wrapper-based methods employ a learning algorithm to iteratively evaluate candidate feature subsets and directly output an optimal feature subset.Generally, wrapper-based methods outperform filter methods and provide superior performance [1], [2].However, existing approaches still face challenges, including computational complexity and being prone to local optima, due to the exponentially growing search space as the number of features increases.

Evolutionary algorithms (EAs) have recently gained significant attention and been applied in various domains [3].EAs are particularly suitable for wrapper-based FS methods as they can obtain multiple solutions within only one population.Current state-of-the-art EAs treat FS as a multi-objective problem with two conflicting objectives:1) reducing selected features and 2) classification error rate [1].For instance, Xueet al.[4] introduced nondominated sorting into a multiobjective particle swarm optimization-based FS algorithm.More recently, Chenget al.[5] proposed a steering matrix to guide the evolution of the population in multi-objective evolutionary algorithms(MOEAs).Despite the advancements in MOEA-based FS algorithms, challenges such as complex feature interactions and irregular Pareto fronts, still persist [6].

Motivated by the aforementioned discussions, this letter presents a novel multi-objective evolutionary algorithm (termed IEMOEA) for FS.The algorithm is based on the elite-guided solution generation strategy and the information-based mutation operator.The eliteguided solution generation strategy facilitates the learning of a feature subspace and accelerates population convergence.The main contributions of this work are as follows: 1) Introducing a novel eliteguided solution generation strategy to enhance convergence speed and solution quality.2) Incorporating symmetrical uncertainty (SU)into the mutation operator to further improve solution quality by removing redundant features or adding relevant features.3) Utilizing a specialized crowding distance measure to eliminate duplicate solutions and ensure population diversity.Experiments are conducted on 20 datasets, encompassing both low and high-dimensional scenarios,to verify the effectiveness and superiority of the proposed IEMOEA method.

Methodology: Our proposed method is realized in Algorithm 1.It can be introduced from three aspects in details:

1) Generating elite solutions (Line 2 in Algorithm 1): Fig.1 shows the process of generating the elite solution.Consider a problem withD=5features, the problem can be formulated as maximizing the classification accuracy of the solutionE, s.t.E=(F1,...,Fi,...,F5),whereFiindicates thei-th feature.Firstly, our proposed method calculates SU values between the features and the label.The SU value between thei-th featureFiand the label vectorCis given by

wherep(fi,c) is the joint probability distribution function ofFiandC, andp(fi) andp(c) are the marginal probability distribution functionsofFiandC,respectively.Then,themethodemploystournamentselection toinitializeindividuals.Asshown inFig.1,two features are randomly selected, and the feature with the largerUvalue is chosen.This process is repeatedD=5 times.The corresponding positions in the individual are encoded as “1”, while the other positions are set to “0”.Then the proposed method applies a dimension reduction operator to the individual.The number of features to be re duced is set as 「R×D⏋, whereRrepresents the reduction rate.Tournament selection is also used to reduce features, where two features are randomly selected, and the feature with the smallerUvalue is reduced.This process is repeated 「R×D⏋ times.Inspired by [7], a linear decrease ofRis applied to balance exploration and exploitation, given as

TD ˆt Algorithm 1 IEMOEA(N, D, , )TD ˆt Input: Population size N, decision space dimensionality D, training data; termination criterion ;Output: Final population P;←1: U Calculate SU by (1);E ←EliteGeneration(TD,U)2: ;P ←Initialize(N-1)3: ;P ←P∪E 4: ;< ˆt- ˆte 5: while t do P′←6: Randomly select N parents;O ←Variation(P,P′)7: ;P ←Duplicationanalysis(P∪O)8: ;P ←Nondominatedsorting(P)9: ;P ←10: the first N solutions in P;11: t = t + 1;12: end

Fig.1.Illustration of the elite solution formulation.

2) The SU-based variation operator (Line 7 in Algorithm 1): The proposed IEMOEA adopts existing single-point crossover and SUbased mutation for binary variation.Specifically, the SU-based mutation operator is designed to flip one of the selected or non-selected variables in the individual with a probability of 0.5.Ifrand>0.5,two non-selected features are randomly chosen and the corresponding position of the feature with largerUvalue is flipped to “1”.Conversely, ifrand<0.5, two selected features are randomly chosen and the corresponding position of the feature with smallerUvalue is flipped to “0”.This mutation operator facilitates a local search mechanism and ensures the quality of the generated offsprings.

3) The special crowding distance-based duplication analysis (Line 8 in Algorithm 1): An issue of FS is the duplicate solutions in both decision and objective space, which can lead to population diversity degradation.Therefore, as part of the environmental selection process in IEMOEA, we initially eliminate all duplicate solutions,retaining only a single unique solution for each set of duplicates.Subsequently, we employ the special crowding distance technique introduced by [8] to handle solutions that share the same objective values but differ in the decision space.The solution with the maximum special crowding distance in the decision space is preserved.Following the duplication analysis, we utilize the non-dominated sorting [9] to rank the population and retain the top-Nindividuals.

Experiments: The implementation of all the MOEAs was performed using the open-source platform PlatEMO [10].To ensure consistency, we adopted the specific parameter settings outlined in their original papers [5], [9], [11], [12].

During the experiments, the datasets were separated into two subsets: a training set comprising approximately 70%, and a testing set comprising approximately 30% of the samples.It is worth mentioning that, in the experiments, the SU values were calculated only based on the training data since the testing data is typically unknown in real-world applications.All the algorithms were executed within a wrapper-based approach, andK-nearest neighbors (KNN) (K= 5)was chosen as the wrappered classifier because of its efficiency.Furthermore, to address the issue of FS bias, we employed a 10-fold cross-validation strategy during the training process.

Table 1 presents the 20 datasets utilized in our experiments,encompassing diverse fields such as life sciences, physics, video processing, image analysis, and bioinformatics.These datasets were obtained from the UCI machine learning repository [13], and theirspecific descriptions can be found there.

Table 1.Benchmark Datasets

In order to assess the performance of the proposed algorithm, we employ the hypervolume (HV) indicator [3], a well-established measure for MOEAs.The two objectives optimized in this letter, namely the selected features ratio and classification error rate, are already scaled to the range (0,1) in each objective direction.As a result, we set the reference point for HV as ( 1,1) to evaluate the performance of FS algorithms.A higher HV value indicates better algorithm performance.

Table 2 presents the mean and standard deviation (std) values of the HV for all the MOEAs.The highest mean HV value is highlighted in bold.As shown in Table 2, IEMOEA achieves the best performance in 10 out of the 20 datasets.Additionally, we conducted the Wilcoxon signed-rank test to determine whether the performance difference between IEMOEA and the other algorithms is statistically significant, using a significance level of 0.05.Thep-values obtained from the test are reported at the bottom of Table 2.Notably, all thesep-values are smaller than 0.05, indicating a significant difference between the performance of IEMOEA and its competitors.

To provide a more intuitive assessment of the performance of the proposed IEMOEA, we present the Pareto fronts obtained by all the competitors on the testing set in Fig.2.It is important to note that the dominated solutions in the final population have been removed.The results of four datasets are depicted in Fig.2, including two datasets with a relatively small number of features (< 500), namely Spect and Madelon, as well as two high-dimensional datasets (with >500 features), namely Multiplefeatures and Colon.Fig.2 clearly shows that in all the displayed datasets, IEMOEA successfully obtains solutions with lower classification error rates.Specifically, in the small-dimensional datasets, solutions generated by IEMOEA can dominate those acquired by its competitors.In the large-scale datasets, IEMOEA can find solutions with lower classification error rates compared to DAEA, SparseEA, and MOEA/PSL.Meanwhile, solutions generated by IEMOEA can dominate those acquired by SMMOEA.The visualization analysis of the Pareto fronts confirms the conclusion that IEMOEA outperforms its peers.

Conclusion: This letter addresses the problem of multiobjective FS and presents a novel information-based elite-guide MOEA (named IEMOEA) for FS.The proposed algorithm introduces SU to generate an elite solution, which facilitates rapid convergence of the population.Furthermore, the algorithm leverages feature information in the mutation process to conduct local search and enhance the quality of solutions.Additionally, duplication analysis is implemented to refine the population and maintain population diversity.Experimen-tal results demonstrate the superior performance of IEMOEA compared to other state-of-the-art algorithms.The elite-guided search approach introduced in IEMOEA can also be applied in future research to enhance the search capabilities of other evolutionary FS methods.

Table 2.Mean and Std HV Values Obtained by All Compared Algorithms

Fig.2.The rank one Pareto fronts obtained by all compared algorithms on the testing set.

Acknowledgments: This work was supported in part by the Japan Society for the Promotion of Science (JSPS) KAKENHI (JP22H 03643) and the Japan Science and Technology Agency (JST) (the establishment of university fellowships towards the creation of science technology innovation) (JPMJFS2115).