Forest type identification by random forest classification combined with SPOT and multitemporal SAR data

2018-09-07 03:07:02YingYuMingzeLiYuFu
Journal of Forestry Research 2018年5期

Ying Yu•Mingze Li•Yu Fu

Abstract We developed a forest type classification technology for the Daxing′an Mountains of northeast China using multisource remote sensing data.A SPOT-5 image and two temporal images of RADARSAT-2 full-polarization SAR were used to identify forest types in the Pangu Forest Farm of the Daxing′an Mountains.Forest types were identified using random forest(RF)classification with the following data combination types:SPOT-5 alone,SPOT-5 and SAR images in August or November,and SPOT-5 and two temporal SAR images.We identified many forest types using a combination of multitemporal SAR and SPOT-5 images,including Betula platyphylla,Larix gmelinii,Pinus sylvestris and Picea koraiensis forests.The accuracy of classification exceeded 88%and improved by 12%when compared to the classification results obtained using SPOT data alone.RF classification using a combination of multisource remote sensing data improved classification accuracy compared to that achieved using single-source remote sensing data.

Keywords Random forest classification ·Multitemporal·Multisource remote sensing data·Polarization decomposition

Introduction

Accurate classification of forest type is fundamental to the study of forest resources,forest dynamics,forest biomass,and carbon storage estimation.The use of remote sensing to aid forest type classification is increasingly important in virtually all aspects of forest research.

Examples of remote sensing data used for forest type classification include TM and SPOT optical images.These types of imagery yield data to distinguish forest types based on spectral features and texture information that are reflected in a remotely sensed image.However,an object can be characterized by different spectra and different objects can have the same spectrum.This anomaly results from factors including weather,which affects optical remote sensing images,and complicates classification of forest types through the use of remotely sensed data(Wang and Zhao 2005;Sun 2006).This situation is also observed in the Daxing′an Mountains where there are many forest types.It is more difficult to distinguish forest types in these are as using only spectral characteristics.Microwave remote sensing is a beneficial supplement to optical remote sensing because of its ability to perform day or night imaging and all-weather imaging,penetrate clouds and rain,and generate increased information.Polarization Synthetic Aperture Radar(POLSAR)data with full polarization has been used to produce information that is directly related to physical properties of natural media and backscattering mechanisms including observational data,scattering matrix,covariance matrix,and correlation matrix.Parameters extracted from these matrices through different polarization decomposition methods are applied to the classification(Aghabalaei et al.2016;Li et al.2016;Lee et al.1998;Cloude and Pottier 1997;Freeman and Durden 1998).Touzi et al.(2004)used the C band of POLSAR to classify forest types,and concluded that the use of HH,HV,and VV polarization information improved differentiation between forest types without leaves.Rahman and Sumantyo(2010)noted that vegetation information needed to differentiate between forest and non-forest types can be easily identified from Synthetic Aperture Radar(SAR)images.However,parameters extracted from scattering matrix,covariance matrix,and correlation matrix do not provide sufficient information for accurate POLSAR image classification in certain scenarios such as in complex forest areas where different scattering media exhibit a similar POLSAR response for unavoidable reasons.Previous studies indicated that polarization SAR texture information helps in improving classification results(Borghys et al.2006;Masjedi et al.2016).POLSAR data are also extensively used for terrain classification applying SAR features from various target decompositions and certain textural features(Uhlmann and Kiranyaz 2014).A dual-season POLSAR achieved the highest accuracies,suggesting that seasonality is critical to obtaining high accuracy in wetland cover classification,irrespective of the type of SAR image used(Furtado et al.2016).Two Radarsat-2 images acquired in leaf-on and leaf-off seasons were selected for forest classification and found to be effective(Maghsoudi et al.2013).

Recently,the development of remote sensing technology and application of earth observation satellite sensor technologies has led to remote sensing offering multiple platforms,multiple sensors,and multispectral characteristics and providing better spatial and temporal resolution.The fusion of different remote sensing technologies enables classification of forest types with improved accuracy.Kasapoglu et al.(2012)used fusion data from ALOS PALSAR and TM to classify forest types and documented an increase of 4%in precision in comparison to that obtained by using a TM image alone.A canopy elevation model combined with images from ALOS PALSAR,RADARSAT-2,and SPOT was used to classify vegetations in the Alps and achieved 97.7%precision(Laurin et al.2013).The linkage of multispectral LiDAR and radar data yielded information on vegetation reflectance,height,and the backscattering mechanism to allow for improved mapping and characterization accuracy(Niculescu et al.2016).The highest accuracy of land use/land cover classification was derived from multitemporal,multisensor,and multipolarization SAR satellite images(Huett et al.2016).

The sole use of optical remote sensing data or microwave remote sensing data cannot achieve high accuracy for high-precision forest type recognition.However,the use of a combination of data,such as optical remote sensing and microwave remote sensing data,offers complementary information that can greatly improve accuracy,is practicable,and leads to improved results.In this study,a combination of multiphase C-band data from polarization RADARSAT-2 and SPOT5 optical images was used to analyze different forest types and their polarization scattering features,spectral information,and phase characteristics in August and November 2013.The random forest classification method was then used to classify the forest types of the Pangu experimental forest area.

Materials and methods

Study area

Our study area was at Pangu Forest Farm(Tahe Forestry Bureau,Tahe County,Daxing′an Mountains,Heilongjiang Province).Tahe County is located in the northwest of the Daxing′an Mountains in the northernmost part of China at 123°20′02′′–124°21′40′′E and 52°16′38′′–52°47′4′′N(Fig.1).The farm covers 1120.7 km2with elevations of 800–1400 m.The climate is cool continental,with average and maximum annual temperatures of-2.4 and 47.2°C,respectively.Annual precipitation ranges from 300 to 450 mm and is mainly concentrated in July and August.Forest covers 88%of the total area.Dominant forest tree species includeLarix gmelinii,Pinus sylvestris,Betula platyphylla,Populus davidianaandPicea koraiensis.

Remote sensing data sources

Polarized RADARSAT-2 images in two phases and high spatial-resolution SPOT5 images were used to identify forest types.RADARSAT-2 is a high-resolution commercial radar satellite that carries a C-band sensor and was launched on 14December 2007 by the Canadian space agency and MacDonald,Dettwiler and Associates Ltd.(MDA).The wave length range of the C band is 3.75–7.5 cm,and the orbital repeat cycle of RADARSAT-2 is 24 days.Additionally,POLSAR data were selected from the HH,VV,HV,and VH polarimetry modes at two phases with the same orbital parameters,namely in the lush plant growth period from August 2013 and the leaf litter period from November2013.The resolution was 12×8 m.SPOT5(French National Space Research Center)is an earth observation satellite in sun-synchronous orbit that was launched at the end of 2001.The maximum resolutions of the panchromatic and multispectral bands are 2.5 and 10 m,respectively.The multispectral bands include B1(0.49–0.61 μm),B2(0.49–0.61 μm),and B3(0.78–0.89 μm).Forest inventory data recorded during earlier years were acquired for the study area,including the sub-compartment distribution.These data were used to verify the results of the forest type classification made using remotely sensed data.

Fig.1 Location of the study area

Data preprocessing

Data preprocessing involved image filtering,terrain correction,geometric correction,and registration of multiphase SAR data and optical image data.First,SPOT5 panchromatic and multispectral data were fused to acquire a fusion image at 2.5 m spatial resolution.This was followed by atmospheric correction,image mosaic,multilook processing and SAR data filtering using PolSARpro software,and by geometric correction and registration on two-phase SAR images based on the SPOT5 images after orthographical correction.The polarization SAR image was resampled to 2.5 m by using the nearest-neighbor resampling method to combine the optical images and SAR data.

Method of classification

Classification system

A classification system was developed based on the present situation of the land use classification standard(Bu 2007),rules of forest resource survey in cities and counties in Heilongjiang Province,and in combination with remote sensing images and forest resource inventory data.The major forest types in the study area are mixed coniferous forests and mixed coniferous and broadleaved forests,namelyB.platyphylla,P.sylvestris,L.gmelinii,andP.koraiensisforests.Mixed forests were not classified because the pixels might consist of identical features since the highest spatial resolution of SPOT5 images and resampled RADARSAT-2 images was 2.5 m.The forest type classification system was designed for non-forests andB.platyphylla,P.sylvestris,L.gmelinii,andP.koraiensisforests based on the above factors.

Table 1 Parameters extracted from polarization decomposition of the RADARSAT-2 image

Classification by the random forest method

The random forest method implements Breiman’s random forest algorithm for classification(Breiman 2001),uses bootstrap samples of data and a decision tree.Successive decision trees provide corresponding prediction results.A simple majority vote is taken for the final prediction.Given thatNsamples are selected,the probability of each nonselected sample is(1-1/N)N.When the number of samples(N)is sufficiently high,the probability converges to 0.368(1/e≈0.368),indicating that 37%of the samples do not appear in the training set to participate in the training model.The part that is not in the sample bag is termed Out of the Bag(OOB)and is used as a validation set to evaluate model performance.For each decision tree,the learning machine produces an OOB bag for accurate estimates.This OOB is also used to obtain a running unbiased estimate of the classification error when trees are added to the forces to acquire estimates of variable importance.Variables with importance values exceeding 0.01 are selected for classification.Random forest classification displays high prediction accuracy and good tolerance to outliers and noise.It is not easy to create an over- fitting phenomenon.Random forest classification is a type of non-parameterized modeling tool with adaptive functions that are suitable to solve problems resulting from a lack of prior knowledge and data without constraint conditions and rules.It effectively analyzes interaction and non-linear relationship between data and is used to handle substantial or multidimensional data.

Feature extraction from RADARSAT-2 data for classification

Several classification methods used by full-polarization SAR data are based on decomposition theory.The scattering characteristics of decomposed targets obtained from polarization SAR data reflect features of different objects.Typically,target decomposition methods include coherent and incoherent polarization decompositions.The incoherent decomposition method is selected to decompose the targets due to the complexity of natural targets.Feature extraction from RADARSAT-2 data for classification is divided into three categories.The first category includes a covariance matrix,a coherent matrix,and eigenvalues directly obtained from the original data.The second category is based on different decomposition methods and includes several decomposition parameters(Cloude and Pottier 1997;Krogager 2006;Freeman and Durden 1998;Huynen 1978;Holm and Barnes 1988;Yamaguchi et al.2006;Evans et al.1988).For example,polarization parameters of scattering entropy(H),scattering angle(α),and anti-entropy(A)are collected from a coherent scattering matrix based on the Cloude decomposition method.The third type includes the radar vegetation index(Ling et al.2009)and total power.Overall,47 parameters are extracted from each RADARSAT-2 image(Table 1).

Computational complexity increases if all polarization parameters are used to identify forest types.The parameters are highly relevant.An increase in the number of parameters used for classification increases noise to an extent that forest types cannot be distinguished.Therefore,parameters in Table 1 should be eliminated first.

The random forest model chooses variables by calculating their importance such that the important variable reduces prediction ability and increases errors in the model after adding noise to these variables.The original OOB data initially validate the model and increase its accuracy.A variable that adds noise to the OOB dataset is then used to revalidate the random forest model to obtain a new level of accuracy.The difference between the levels of accuracy are calculated from the original OOB and new OOB with the level of noise representing the importance of the corresponding variable.An increase in the importance of the model variable significantly decreases the accuracy calculated using OOB data.The importance of the 47 parameters extracted from RADARSAT-2 images in August and November was calculated by the RF model.Highly important parameters were chosen to identify forest types.The DEM in the study area was selected as secondary data and used in forest type classification to reduce the influence of topography.The DEM data were compiled from the ASTER GDEM V2 data released by NASA in October 2011at a resolution of 30×30 m2.

Three schemes were used for classification by the RF method,namely the use of(1)SPOT-5 alone,(2)SPOT-5 and SAR images in August or November,and(3)SPOT-5 and two temporal SAR images.

Separability of samples

The ROI separability of training samples of different objects can be determined based on the Jeffries–Matusita(J–M)distance(Richards and Jia 1986).It ranges from 0 to 2 and shows improved sample separability when the value is closer to 2(Ma et al.2010).According to the forest resource inventory data and SPOT images,training samples of different forest types were chosen to be evenly distributed on images with obvious features:200 uniform training samples forB.platyphylla,P.sylvestris,andL.gmeliniiforests and 50 uniform training samples forP.koraiensisforest and non-forest.

Results

Importance of variables for classification

Important parameters used for image classification acquired from the POLSAR image in August and SPOT wereC22,VanZyl_vol,Yamaguchi_vol,T11huy,Freeman_vol,λ1,Span,SPOT-5,andDEM;those used for image classification acquired from the POLSAR image in November and SPOT wereH,Yamaguchi_vol,T11huy,T11,α,VanZyl_odd,SPOT-5,DEM,Span,RVI,andT13huy_real;and those used for image classification on multitemporal POLSAR images and SPOT wereRVI(11),Span(11),H(11),T11Holm(8),T11huy(8),SPOT-5,andDEM(superscripts of(8)and(11)refer to parameters extracted from the RADARSAT-2image in August and November,respectively).Figure 2depicts the importance of these variables.

Separability calculation

The J–M distance was calculated for the three schemes(Fig.3).The results with respect to the combination of RADARSAT images from August and SPOT images,the combination of RADARSAT images from November and SPOT images,and SPOT images alone indicate the following:(1)insufficient differentiation of theB.platyphyllaforest because tree growth was more lush in August with a relatively similar scattering characteristic to coniferous forest.(2)Although scattering characteristics of coniferous and broadleaved forests are random and similar,the separability of training samples improved because the numbers of leaves of the broadleaved forest in November decreased.(3)The scattering and spectral characteristics ofP.sylvestris,L.gmelinii,andP.koraiensisforests were easily distinguished without the effects of a broadleaved forest in contrast to when only SPOT images were chosen.Hence,a few polarization SAR image parameters were added for classification.Highest separability of training samples was observed when combinations of spectral and scattering characteristics from SPOT and RADARSAT images in August and November were used.

Fig.2 Importance of the variables(Note A large decrease in accuracy indicates a more important variable)

Fig.3 Separation of training samples by(J–M)distance

Classification results and analysis

Fig.4 Forest type classification of the Pangu Forest Farm Classification images using a SPOT data(scheme 1);b RADARSAT-2 images(August and SPOT data from scheme 2);c RADARSAT-2 images from November and SPOT(scheme 2);and d SPOT and multiphase RADARSAT-2 images from August and November(scheme 3)

Forest type classification of the Pangu Forest Farm Classification images using three schemes were shown in Fig.4.The classification precision of scheme 1 corresponded to 77%,indicating that theB.platyphyllaforest was accurately distinguished from coniferous forests,although it was confused with non-forests because a part of the open forest land ofB.platyphyllawas classified as a non-forest.With respect to the coniferous forests,L.gmelinii,P.koraiensis,andP.sylvestrisforests were mixed to a certain extent.Therefore,classification using optical images alone is not sufficiently accurate,and thus,RADARSAT-2 images were added for forest type identification to supplement optical images.

The classification result precision was 80%when the SPOT and RADARSAT-2 images from August were combined.The result evidently improved after adding decomposed parameters from the RADARSAT-2 images.Although microwave data possess penetration characteristics and effectively overcome misclassification of the open forest land ofB.platyphyllaforest and non-forest,the data are limited in their ability to improve the classification capacity because of random and complex scattering characteristics of vegetation in August.

The classification result corresponded to 85%when the SPOT and RADARSAT-2 images from November were combined,and was superior to the classification accuracy of using SPOT and RADARSAT-2 images from August.In November,all leaves inB.platyphyllaforest fall and this aids in distinguishing amongL.gmelinii,P.koraiensis,andP.sylvestrisforests,although the difference betweenB.platyphyllaforest and non-forest was lower for increased microwave scattering components from the trunk and surface and the multiple scattering between them.

In scheme 3,the classification result precision was 88%when SPOT and RADARSAT-2 images from August and November were combined,an improvement in total accuracy and precision because of the combination of the multiphase polarization characteristic parameters of the RADARSAT-2 images from August and November and optical images.Multiphase features compensated for each other with this combination.

Discussion and conclusions

The use of the RF classification method based on polarization information,spectral information,and phase characteristics reflected from multiphase microwave and optical images provided more accurate forest classification than did any of these methods when applied individually.The use of only spectral information from SPOT5 images confused coniferous forests because of their relatively close spectral characteristics with only 77%precision.The addition of full-polarization SAR data from August and November increased precision levels to 80 and 85%,respectively.The maximum total precision was 88%for feature compensation following the introduction of multiphase RADARSAT-2 images.

The complexity of the forest led to difficulties in feature extraction among different forest types.Texture information can be added since full-polarization SAR data were used for forest type classification.Additionally,coherent interference information from RADARSAT-2 images was ignored.In future studies,classification results can be improved by combination with interference information.