YU Lu(俞 璐),XIE Jun(谢 钧),ZHANG Yanyan(张艳艳)
1Institute of Communications Engineering,PLA University of Science and Technology,Nanjing 210007,China
2College of Command Information System,PLA University of Science and Technology,Nanjing 210007,China
As an important issue in remote sensing,hyperspectral image classification has been widely used in reconnaissance,assessment of environmental damage,land use monitoring,urban planning,and growth regulation.Hyperspectral image classification aims to assign each pixel an object class label from a predetermined label set.The task is known as image labeling or semantic image segmentation in the area of computer vision and pattern recognition.
Feature extraction is a main topic in the research of hyperspectral image classification.Two kinds of feature extraction methods are often used.One is dimension reduction of spectral data,including all kinds of feature transforms[1-2]and feature selections[3-5].The other is extraction of image features,including color,texture and shapes,etc.The former focuses on spectral feature and the latter focuses on image feature.
Different from RGB images,hyperspectral images are commonly represented in a large number of bands,which are usually highly correlated.To classify apixel correctly,not only the correlation of labels of adjacent pixels,but also the correlation of spectral bands should be exploited.That is,neither image features nor spectral features are enough to classify apixel correctly.So the two kinds of features are often combined to classify hyperspectral image.
Since application of neural network in remote sensing image classification has attracted great attentions[6-8],in this paper,neural network is used to combine multiple features,including image features and spectral features, in hyperspectral image classification.
Data fusion technique may be the most effective method to combine multiple features.Data fusion technique used in pattern recognition includes feature level fusion[9-10]and decision level fusion[11-12].The former combines the inputs of the classifiers,while the latter combines the output of the classifiers.
In decision level fusion,fusion strategy is the key problem.The simple strategies,such as majority and weighted average, can not achieve satisfactory performance.Furthermore,the determination of weights is a difficult point.
Feature level fusion means that the features are combined before they are fed into the classifier.The most often used combination method is concatenation or parallel connection of features.The latter can be only used in the case that all the features have the same dimension.Multiple features generally have different scales,e.g.gray values range from 0to 255for 8-bit image,while some histogram features have values in [0 1].If these features of significantly different scales are connected directly,the effect of features with very small values would be depressed unfairly.However,if multiple features are normalized before they are connected,some differences between multiple features,which may be helpful to classification,would disappear.Furthermore,if multiple features of significantly different dimensions are connected directly,it is unfair to the features of small dimensions.So direct connection is not a good choice in classification.
In this paper,a feature fusion method based on artificial neural network is proposed.Different from network ensemble[13],a single neural network is used to perform the task.The network architecture is especially designed for multiple features problem.Experiment results show that the proposed network has obvious advantages over ordinary feed-forward neural network on both performance and complexity.
There are two typical methods based on neural network to combine multiple features.One is to train a single neural network.The other is to ensemble several networks.
In the first type of method,all the features are input into one neural network.It is equivalent to conjunction of multiple features,because all the features are connected to the same hidden neurons.As discussed above,direct conjunction may cause unfairness among the features of different physical meanings,different scales, and different dimensions.Furthermore,conjunction of multiple features has large dimensions which may result in much complex model,and training which requirements large number of samples.
In the second type of method,several neural networks are trained,each for one feature,and their results are combined with another neural network[13].In this type of method,the networks are trained independently,which may result in local optimum,instead of global optimum.Furthermore,neural network ensemble is one of decision fusion methods.Compared with feature fusion,decision fusion usually can not fuse information of multiple features effectively.
In this paper,a feature fusion method with a single feed-forward neural network,instead of network ensemble,is proposed.The network architecture,shown in Fig.1,is different from ordinary feed-forward network.As shown in the figure,although all the features are input into the same network,they are input into different hidden neurons.Compared with ordinary feed-forward neural network,the modified network architecture has two main advantages.
First,that different features connect to different hidden neurons,instead of connecting to the same neurons,means that each feature is handled separately.The effect of the hidden neurons is to provide feature transform.Different hidden neurons map different features to some hidden features,or middle features.If the network is well trained,these middle features would be in the same feature space.Then these middle features would have the same scale,the same physical meaning,comparable dimensions,so those problems of direct conjunction of features could be avoided.
Second,since each feature connects to only part of the hidden neurons,instead of all the hidden neurons,the number of weights to be trained significantly decreases.Then the model is less complex.The training of the model takes less time and needs less training samples.
Furthermore,different from neural network ensemble,in which the networks are trained independently,the network we proposed needs a jointly training method.Joint training makes it possible to find global optimum and can fuse the essential information of the features effectively.
Fig.1 Architecture of the neural network to combine nfeatures
Image features often used in image classification are texture,shape,color features,etc.In hyperspectral image classification,texture feature plays the important role,since texture can reveal geology information of the earth's surface,which is important in hyperspectral image classification.In this paper,two texture features,including Gabor features and local binary pattern(LBP)features,are used.Furthermore,scale invariant feature transform(SIFT)feature is also used.
Gabor features are captured by a serial of Gabor filters,which capture the specific orientation and scale properties of hyperspectral image data.Gabor features consist of the responses of the filters.The most often used Gabor filter has the following form[10]:where x=(x,y)is the vector of spatial position and k =(π/2fs)ei(πd/8)is the vector of frequency.The s and d are parameters of scale and direction.In the experiments,set f=2,δ=2π.After the image I(x,y)is filtered by a Gabor filter with the scale s and the direction d,the texture image Fsd(x,y)=Gsd(x,y)*I(x,y)is got.If one image is filtered by K Gabor filters,then Ktexture images of the same size would be got.The K values of one pixel can be used as the Gabor features of that pixel.
LBP feature[14]is one of the most often used image texture features in image analysis and image classification.For each pixel,compare the pixel with each of its 8neighbors(on its left-top,left-middle,left-bottom,right-top,etc.).Where the center pixel's value is greater than the neighbor's value,write “1”,otherwise,write“0”.This gives an 8-digit binary number.All possible 256 8-digit numbers are merged into 59patterns.Each pixel in the image belongs to one of the 59patterns.To extract LBP feature of one pixel,the 59-bins histogram of LBP features of all the pixels within the region around that pixel are calculated.
SIFT feature[15]is often used in image matching and objection recognition.The features are invariant to image scaling and rotation,and partially invariant to change in illumination and camera viewpoint.The generation of image features includes four major stages, scale-space extrema detection, keypoint localization,orientation assignment,and keypoint descriptor.One keypoint is represented as a 128-dimensional vector,indicating scale,orientation,and location of the image gradients near the keypoint.SIFT features of an image are N (number of keypoints in the image)128-dimensional vectors.
To calculate the SIFT feature of one pixel,all the keypoints of the whole image are clustered into M groups,and each keypoint would be clustered into one group.For each pixel,the M-bins histogram of keypoints in the region around that pixel is calculated.
To test the performance of the proposed method,experimentsare done on a most often used hyperspectral image set.In the experiments,the spectral feature and three image features,including Gabor,SIFT and LBP,are used.
The data setconsists of 220bands.Each band is a 145×145 pixel image,and all the pixels are labeled into one of 17classes:background,Alfalfa,corn-no-till,corn-min,corn,grass/pasture,grass/trees,grass/pasture-mowed,Hay-windrowed,oat,wheat,woods,soybeans-no-till,soybeans-min,soybean-clean,Bldg-Grass-Tree-Drives,and stone-steel towers.
Spectral feature:dimension of the spectral values are reduced from 220to 10by principal component analysis(PCA).
Gabor feature:total 60(s=0,1,2,3,4;d=0,1,2,…,11)Gabor filters are used to filter the first principal component image.That is,the dimension of the Gabor feature is 60.
LBP feature:for each pixel,the 59-bins histogram of LBP features of those pixels within the 21×21image block are calculated,in the first principal component image,centered on the pixel;the dimension of the LBP feature is 59.
SIFT feature:all the keypoints in the first principal component image are clustered into 50groups;for each pixel,the 50-bins histogram of keypoints in the 21×21image block centered on the pixel are calculated;and the dimension of the SIFT feature is 50.
To show the performance of the proposed method,the results are compared with some feed-forward neuron networks, whose inputs are each single feature and conjunction of the features respectively.In all the networks,the transfer functions of the neurons in the hidden layer are“tansig”,and the transfer functions of the neurons in the output layer are“purelin”.Training functions of the networks are“trainlm”and performance functions are“mse”.All the layers in all the networks have biases.The number of neurons in output layer is set to 17,which is the number of the classes in AVIRIS 92AV3CIndian Pine data set.The correct output of a sample of class i is a 17-dimensional vector consisting of 16zeros and a 1located in the ith component.
The results are shown in Table 1.In the experiments,10%of pixels were randomly selected to train the neural network,while others are used to test performance.The results are the mean values of 20random experiments.
Table 1 Results ofseveral methods to combine multiple features with neural networks
Since the number of the neurons number in hidden layer affects the complexity and the performance of the model significantly,several experiments with different number of neurons in the hidden layer are done.In each experiment,the highest recognition rate is bolded.The“network size”is represented by the number of the weights to be learned.
As shown in the table,the method proposed achieves the best performance for different neuron numbers.Compared with conjunction of multiple features,the proposed method obtains better performance with much less training time.As discussed above,in this proposed method,since each feature connects to only part of the hidden neurons,the number of weights trained significantly decreases.The training of the model takes less time.Furthermore,this method avoids the problems in the direction conjunction of multiple features,and better performance is got.
In hyperspectral image classification,image features and spectral features are often combined to improve recognition rate.In this paper,a method based on neural network was proposed to combine multiple features.Instead of ensemble of several networks,one single network was used to perform the task.The architecture of feed-forward network was modified to fit the task of dealing with multiple features.The experiment results show that the method has obvious advantage over single features and conjunction of multiple features on both recognition rate and training time.
[1]Bagan H,Takeuchi W,Aosier B,et al.Extended Subspace Method for Remote Sensing Image Classification [C].Proceedings of IEEE International Geoscience and Remote Sensing Symposium,Boston,USA,2008:927-930.
[2]Tian Y Q,Guo P,Lyu M R.Comparative Studies on Feature Extraction Methods for Multispectral Remote Sensing Image Classification [C].Proceedings of IEEE International Conference on Systems,Man and Cybernetics,Hawaii,USA,2005:1275-1279.
[3]Cariou C,Chehdi K,Moan S L.Bandclust an Unsupervised Band Reduction Method for Hyperspectral Remote Sensing[J].IEEE Geoscience and Remote Sening Letters,2011,8(3):565-569.
[4]Guo B F,Damper R I,Gunn S R,et al.A Fast Separability-Based Feature-Selection Method for High-Dimensional Remotely Sensed Image Classification [J].Pattern Recognition,2008,41(5):1653-1662.
[5]Du Q,Yang H.Similarity-Based Unsupervised Band Selection for Hyperspectral Image Analysis[J].IEEE Geoscience And Remote Sensing Letters,2008,5(4):564-568.
[6]ZouW B,Yan W Y,Shaker A.Neural Network Based Remote Sensing Image Classification in Urban Area[C].The 2012 International Joint Conference on Neural Netowrks(IJCNN),Brisbane,QLD,2012:1-6.
[7]Yang B,Liu Z J,Xing Y,et al.Remote Sensing Image Classification Based on Improved BP Neural Network [C].2011International Symposium on Image and Data Fusion(ISIDF),Tengchong,Yunnan,2011:1-4.
[8]Liu Q,Wu G M,Chen J M,et al.Interpretation Artificial Neural Network in Remote Sensing Image Classification[C].2012the 2nd Internation Conference on Remote Sensing,Environment and Transportation Engineering (RSETE),Nanjing,China,2012:1-5.
[9 ]Yin Q, Guo P.Multispectral Remote Sensing Image Classification with Multiple Features[C].Proceedings of the International Conference on Machine Learning and Cybernetics,Hong Kong,China,2007:19-22.
[10]Zhang L F,Zhang L P,Tao D C,et al.On Combining Multiple Features for Hyperspectral Remote Sensing Image Classification [J].IEEE Transactions on Geoscience and Remote Sensing,2012,50(3):879-893.
[11]Yang H,Du Q,Ma B.Decision Fusion on Supervised and Unsupervised Classifiers for Hyperspectral Imagery[J].IEEE Geoscience and Remote Sensing Letters,2010,7(4):875-879.
[12]Yang H,Du Q,Ma B.Weighted Decision Fusion for Supervised and Unsupervised Hyperspectral Image Classification[C].2009IEEE International Geoscience and Remote Sensing Symposium,Hawaii,USA,2010:3656-3659.
[13]Zhou Z H,Huangfu J,Zhang H J,et al.View Invariant Face Recognition Based on Neural Network Ensemble[J].Journal of Computer Research and Development,2001,38(10):1204-1210.(in Chinese)
[14]Ojala T,Pietikainen M,Maenpaa T T.Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Pattern [J].IEEE Transactions on Pattern Analysis and Mahcine Intelligence,2002,24(7):971-987.
[15]Lowe D G.Distinctive Image Features from Scale-Invariant Keypoints[J].International Journal of Computer Vision,2004,60(2):91-110.
Journal of Donghua University(English Edition)2015年2期