A multi-criteria fusion feature selection algorithm for fault diagnosis of helicopter planetary gear train

2020-07-02 03:04CnfeiSUNYourenWANGGuodongSUN
CHINESE JOURNAL OF AERONAUTICS 2020年5期

Cnfei SUN, Youren WANG, Guodong SUN

a College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

b Testing Center, Aviation Key Laboratory of Science and Technology on Fault Diagnosis and Health Management, Shanghai 201601, China

KEYWORDS Fault detection;Feature selection;F-measure;Helicopter planetary gear train;Multi-objective evolutionary algorithm

Abstract Planetary gear train is a prominent component of helicopter transmission system and its health is of great significance for the flight safety of the helicopter. During health condition monitoring,the selection of a fault sensitive feature subset is meaningful for fault diagnosis of helicopter planetary gear train. According to actual situation, this paper proposed a multi-criteria fusion feature selection algorithm (MCFFSA) to identify an optimal feature subset from the highdimensional original feature space.In MCFFSA,a fault feature set of multiple domains,including time domain,frequency domain and wavelet domain,is first extracted from the raw vibration dataset.Four targeted criteria are then fused by multi-objective evolutionary algorithm based on decomposition (MOEA/D) to find Proto-efficient subsets, wherein two criteria for measuring diagnostic performance are assessed by sparse Bayesian extreme learning machine (SBELM). Further, Fmeasure is adopted to identify the optimal feature subset,which was employed for subsequent fault diagnosis. The effectiveness of MCFFSA is validated through six fault recognition datasets from a real helicopter transmission platform. The experimental results illustrate the superiority of combination of MOEA/D and SBELM in MCFFSA, and comparative analysis demonstrates that the optimal feature subset provided by MCFFSA can achieve a better diagnosis performance than other algorithms.

1. Introduction

Due to large reduction ratio,compact structure,large carrying capacity and stable operation, the planetary gear train is widely used as the final stage of the main gearbox in the helicopter transmission system. Planetary gear train components are prone to various failures with long-term operation in high-speed, heavy-duty, and harsh operating conditions. In the absence of backup, as a core component of the helicopter transmission system, once a fault occurs to the planetary gear train, it will directly affect the flight safety of the helicopter.Therefore, the fault monitoring and diagnosis of helicopter planetary gear train is of great significance for the development of helicopter condition maintenance (CBM).

For the general planetary gearbox fault diagnosis,vibrationbased diagnostic analysis is still the most common and effective method. At present, researchers have carried out work in the fields of mathematical model, signal processing and pattern recognition,and have achieved fruitful results.1However,compared with the general planetary gearbox,the helicopter planetary gear train has more complicated structure,harsher working environment and more severe operating conditions,resulting in unique characteristics of its vibration signal with complicated frequency components, serious noise pollution and coupled modulation.2These characteristics increase the difficulty in selecting features and diagnosing the fault of helicopter planetary gear train. Traditional signal processing techniques, such as time domain,frequency domain and wavelet domain statistical analysis,can extract useful fault feature information for fault diagnosis.3-6Some researchers even raised some new features specifically for planetary gear train fault diagnosis.7-9However,aiming at the distinctive signal characteristics of helicopter planetary gear train,only a few characteristic parameters in a certain domain cannot fully express the fault information, much less obtain satisfactory diagnostic results.Therefore,it is more effective to extract the maximum number of fault features from multiple domains for fault diagnosis. However, multi-domain features are high-dimensional, and always noisy, irrelevant and redundant.If these feature sets are used directly,the calculation amount is huge,and the diagnosis result will be unfavorable.Therefore,Feature selection is necessary.

Feature selection is considered an essential step to preprocess high-dimensional datasets, as this method can extract a small quantity of representative features, and obtain high accuracy in subsequent classification. In general,feature selection can be seen as a complex search optimization problem and proved to be NP-hard,10and hence, exhaustive search is the guaranteed way to obtain optimal features. However, when the cardinality of feature set is large, exhaustive search becomes unacceptable due to massive calculation.At this time,heuristic search algorithms, such as greedy search,11genetic algorithm,12and particle swarm optimization,13are capable of solving it accurately and efficiently. However, these algorithms usually evaluate a feature subset separately using its cardinality, classification performance, or other criteria. Even if multiple criteria are adopted simultaneously, they are integrated into a single objective through a weighted approach.Actually, feature selection is a multi-objective optimization problem,that is,seeking a compromise solution under the condition of simultaneously optimizing multiple evaluation criteria of feature subsets. In recent years, multi-objective evolutionary algorithms (MOEAs) have emerged and applied in many real-world areas.14-16Specifically, they are more suitable and attractive for feature selection that allowing a selected feature subset to be evaluated simultaneously with multiple criteria. As a result, MOEAs are quite popular for feature selection. To date, several multi-objective feature selection algorithms have been proposed.17-20Bing et al. (2013) proposed an evolutionary multi-objective particle swarm optimization feature selection algorithm, which uses two objective functions with maximizing the classification accuracy and minimizing the number of features to optimize feature selection simultaneously.21Karakaya et al. (2015) presented a wrapper for quasi equally informative subset selection(WQEISS) algorithm.22It employed four objective functions and combined non-dominated sorting genetic algorithm II(NSGA-II) and extreme learning machine (ELM) to select multiple high-quality feature subsets. Some researchers have successfully applied these algorithms to tackle the fault feature selection of mechanical system,including planetary gearboxes.However,these methods cannot be directly employed for helicopter planetary gear train fault feature selection.Firstly,most of them consider only two objectives,which limit the candidate range of obtained solutions and only identify only one Paretoefficient subset for each cardinality level in most cases. Secondly, the multiple objective functions constructed by these methods are generic and seldom compatible with the actual engineering application,resulting in a satisfactory feature subset not being obtained. Both of them restrict the performance of subsequent fault diagnosis process.

In order to address the above-mentioned issues, a multi-criteria fusion feature selection algorithm (MCFFSA)is proposed in this study. MCFFSA is a wrapper-based multi-objective feature selection method specifically aimed at helicopter planetary gear trains fault diagnosis. We constructed four targeted criteria as objective functions. The first criterion is the cardinality of the feature subset,and the second one is an entropy-based measure, combining relevance and redundancy. The third criterion is the missing alarm rate,and the fourth is the false alarm rate. The last two are used to evaluate the fault diagnosis ability comprehensively, which assessed by sparse Bayesian extreme learning machine(SBELM). A multi-criteria fusion strategy based on multiobjective evolutionary algorithm based on decomposition(MOEA/D)is also proposed to seek high-performance feature subsets from high-dimension and multi-domain candidate feature set. Finally, based on these feature subsets primarily selected above,the F-measure was adopted to identify the optimal feature subset, which was employed for subsequent fault diagnosis of helicopter planetary gear train.

The experimental results on the helicopter main gearbox test bench validated the superior performance of our proposed method. The main contributions of this paper are as follows:(1)A novel feature selection method for fault diagnosis of helicopter planetary gear train is proposed; (2) Four targeted criteria and F-measure are first integrated to evaluate the candidate feature subsets successively in feature selection process. (3) A combination of MOEA/D and SBELM is first developed for multi-objective feature selection.

This paper is organized as follows.Section 2 introduces the basic principles of MOEA/D and SBELM.Section 3 describes our proposed multi-criteria fusion feature selection algorithm.Experimental results and comparisons are presented in Section 4. Section 5 summarizes the paper.

2. Basic principles

2.1. MOEA/D algorithm

MOEA/D was first proposed by Zhang and Li in 2007, has been recognized as one of the most popular multi-objective evolutionary algorithms to date.23

Fig. 3 Helicopter transmission experimental plat form for fault simulation of planetary gear train.

Table 1 Description of six fault recognition datasets.

In Fig. 5, the approximate optimal subsets found by MCFFSA and the other algorithms are depicted in blue and black, and the optimal subset found by MCFFSA and the other algorithms are depicted in red and cyan, respectively.As described in Section 3.2,the values of the ideal feature subset under four criteria would be the smaller the better,that is a horizontal line running along the bottom of all axes in figure.It is obvious that the lines of MCFFSA are closer to the bottom of all axes than that of other algorithms on the whole.Therefore, the feature subsets selected by MCFFSA is more satisfied with four criteria. As far as the results under the first criterion is concerned,it shows that MCFFSA selects a certain number of feature subsets whose cardinality is in the range of 7-15 and distribute uniformly. However, the feature subsets obtained by MCFFSA1 and MCFFSA2 are mostly concentrated in high dimension, which distribute in range of 8-18.Therefore, compared with MCFFSA1 and MCFFSA2, the diversity of feature solutions found by MCFFSA is better. In addition, for another evaluation criterion, the entropy measure, the values of feature subsets selected by MCFFSA are small and concentrated within a small range as a whole.However,the values of the other algorithms are larger and dispersed within a wider range. It shows that the feature subsets of MCFFAS are more relevant,less redundant and more convergent than that of other four algorithms.For the last and the crucial two evaluation criteria, the superiority of MCFFSA is more obvious.Almost all MA and FA values of MCFFSA are smaller than other algorithms, and are maintained in a minimal range. Generally, the overall diagnostic performance of MCFFSA in selecting feature subset is the best of all five algorithms.

Secondly, we evaluate the performance of the optimal features generated by five algorithms through comparing the F2.All algorithms are repeated ten times to get the average value,wherein the training set and test set are randomly extracted from the whole dataset in 9:1 ratio. The comparative results of HData1 to HData6 are shown as the statistic box plot in Fig. 6. In figure, the F2of MCFFSA is obvious higher than those of other algorithms, that demonstrates the optimal feature of MCFFSA outperforms better in term of fault diagnosis performance.In addition,the F2of MCFFSA is stable in a certain range, that is significantly smaller than those of the other algorithms. It means our algorithm is more robust.

Therefore, the results of experiment I illustrate that MCFFSA combined with MOEA/D and SBELM can select a number of feature subsets that satisfy the evaluation criteria well,and the optimal feature subset selected from those feature subsets based on F2has quite excellent fault diagnosis performance.

4.3. Experiment II

To further verify the performance of MCFFSA, five algorithms are compared with MCFFSA. The first algorithm is a single criterion filter-based feature selection algorithm based on joint mutual information (JMI). The second is a two criteria filter-based feature selection algorithm based on maximum correlation and minimum redundancy(MRMR).The first two algorithms rank the selected features according to their weights, and use SBELM to select the feature subset with the highest F2as the optimal feature subset according to the forward search strategy.The third one is a wrapper-based feature selection algorithm denoted as WMOSS,20wherein the number of features and the classification accuracy are used as two criteria; the fourth and fifth are wrapper-based WQIESS and filter-based FQEISS,21both adopt four criteria, namely, the number of features, classification accuracy, and two entropybased measures of relevance and redundancy. The processing method used in Experiment II are the same as that in experiment I.A more comprehensive summary of the results in HData1 ~HData6 is provided in Fig. 7, which reports the approximate optimal feature subsets found by each algorithm.

Fig. 4 Fault categories.

Fig. 5 Comparison of the performance of approximate optimal feature subsets under four evaluation criteria.

Fig. 6 Comparison of F2 of the optimal feature subset selected by five algorithms.

As expected, JMI and MRMR identify only one feature subset, while WMOSS finds one Pareto-efficient subset for each cardinality level. The number of subsets (for each cardinality level) found by the FQEISS and WQEISS algorithms are larger than one for most of the datasets.Among the above five algorithms, WQEISS provides the best performance, that most of its feature subsets provide higher F2than those of the other algorithms. This is due to the fact that the formulation of a four objective optimization problem enlarges the search space, in which some feature subsets with a higher discrimination power may exist. Similar to the WQEISS algorithm, MCFFSA is capable of identifying multiple subsets for each cardinality level. However, those subsets are uniformly distributed at each cardinality level and yield the higher F2than the other five algorithms. This is because MCFFSA adopts more targeted criteria for fault diagnosis of helicopter planetary gear train and integrates more suitable multiobjective optimization and classification algorithms.

Fig. 7 Comparison of F2 of approximate optimal feature subsets selected by five algorithms.

Table 2 summarizes the average results that six algorithms perform on all fault recognition datasets. As shown in bold face, MCFFSA outperforms the other algorithms, which provide the superior feature subset with the highest F2and the lowest cardinality.Furthermore,MCFFSA requires much less computational time than WQEISS, which is also a wrapperbased four-objective optimization feature selection algorithm.

To further visualize features and illustrate their classification performance, the best feature subset with a size of tenare extracted based on HData1 by MCFFSA and the comparative algorithms, then principal component analysis (PCA)and the clustering analysis are applied to map them into a three-dimensional space, as shown in Fig. 8.

Table 2 The average results on six fault recognition datasets.

As can be seen, seriously overlap are observed in JMI and MRMR,apart from clusters1,other clusters are mixed and difficult to identify.Similarly, seriously overlap happens between clusters 2 and 3,clusters 2 and 4,and between clusters 4 and 5 in FQEISS, and between clusters 2 and 3, clusters4 and 5 in WMOSS and WQEISS. By comparison, MCFFSA outperforms the above five algorithms,and only slight overlaps occur in clusters 2 and 3 and between clusters4 and 5. The results demonstrate these features extracted by MCFFSA exhibit excellent classification performance,and explains the superiority of MCFFSA in extracting strongly correlated and discriminative high-level features.

4.4. Experiment III

During the operation of helicopter,there will always be fluctuations in the working conditions.This experiment is conducted to investigate the applicability of MCFFSA under such situations. The five algorithms in Experiment II are still employed to compare with MCFFSA.

In this experiment, the optimal feature subset found by each algorithm in HData6 is applied to test its diagnosis performance under other different operating loads. The comparison results are shown in Fig. 9.

Because the amplitude of fault-sensitive information decreases significantly more than that of the interference signal caused by manufacturing and assembly errors with the decrease of the load, the results for all of the employed algorithms deteriorate with the decrease of the operating load.However, MCFFSA exhibits better performance than the other algorithms, obtaining a minimal difference, whereas the other algorithms display obvious decline. It indicates that MCFFSA identifies the shared and robust features with some invariance to the fluctuant operating loads.Therefore,the performance of MCFFSA still might be acceptable for a range of operating load levels due to its good stability for the working conditions.

Fig. 8 Features visualization for first three principal components (PC1, PC2 and PC3).

Fig. 9 Comparison of F2 of optimal feature subset selected by five algorithms under different loads.

5. Conclusions

(1) A novel multi-criteria fusion feature selection algorithm called MCFFSA for fault diagnosis of helicopter planetary gear train is developed in this paper. In our proposed algorithm, four targeted criteria are employed to evaluate the selected feature subset, then MOEA/D is combined with SBELM to extract Preto-efficient subsets, and finally the F-measure is adopted to find the optimal feature subset.

(2) The comparative analysis on six fault recognition datasets from a real helicopter transmission experimental platform demonstrates that the combination of MOEA/D and SBELM in our proposed algorithm has better diagnostic performance on feature selection than the combination of MOEA/D and ELM, MOEA/D and SVM, NSGII and SBELM, SPEAII and SBELM for fault diagnosis of helicopter planetary gear train.

(3) Our proposed algorithm indicates that it can obtain more and better feature subsets than two wrapperbased multi-objective feature selection algorithms,WMOSS and WQEISS. Additionally, it also outperforms several filter-based algorithms, such as JMI,MRMR and FQEISS. The superiority of our proposed algorithm is validated and explained through comparative analysis on fault datasets with different load.

Acknowledgements

This study was co-supported by the Equipment Pre-research Foundation Project of China (No. JZX7Y20190243016301),Helicopter Transmission Technology Key Laboratory Foundation of China (No. KY-52-2018-0024), and the Fundamental Research Funds for the Central Universities & Funding of Jiangsu Innovation Program for Graduate Education under Grant (No. KYLX16_0336).