Drogue detection for autonomous aerial refueling based on convolutional neural networks

2017-11-21 12:54:27WngXufengDongXinminKongXingweiLiJinminZhngBo
CHINESE JOURNAL OF AERONAUTICS 2017年1期

Wng Xufeng,Dong Xinmin,Kong Xingwei,Li Jinmin,Zhng Bo

aSchool of Aeronautics and Astronautics Engineering,Air Force Engineering University,Xi’an 710038,China

bState Key Laboratory of Intelligent Technology and Systems,Tsinghua National Laboratory for Information Science and Technology,Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China

Drogue detection for autonomous aerial refueling based on convolutional neural networks

Wang Xufenga,b,Dong Xinmina,Kong Xingweia,Li Jianminb,*,Zhang Bob

aSchool of Aeronautics and Astronautics Engineering,Air Force Engineering University,Xi’an 710038,China

bState Key Laboratory of Intelligent Technology and Systems,Tsinghua National Laboratory for Information Science and Technology,Department of Computer Science and Technology,Tsinghua University,Beijing 100084,China

Autonomous aerial refueling;Computer vision;Convolutional neural networks;Deep learning;Drogue detection

Drogue detection is a fundamental issue during the close docking phase of autonomous aerial refueling(AAR).To cope with this issue,a novel and effective method based on deep learning with convolutional neural networks(CNNs)is proposed.In order to ensure its robustness and wide application,a deep learning dataset of images was prepared by utilizing real data of ‘Probe and Drogue”aerial refueling,which contains diverse drogues in various environmental conditions without artificial features placed on the drogues.By employing deep learning ideas and graphics processing units(GPUs),a model for drogue detection using a Caffe deep learning framework with CNNs was designed to ensure the method’s accuracy and real-time performance.Experiments were conducted to demonstrate the effectiveness of the proposed method,and results based on real AAR data compare its performance to other methods,validating the accuracy,speed,and robustness of its drogue detection ability.

1.Introduction

Autonomous aerial refueling(AAR)can greatly expand the effectiveness of unmanned aerial vehicles(UAVs)but is still a challenging task for UAVs.1Consequently,AAR has gained much attention due to its vital importance in extending operational range.2–8One AAR method,‘Probe and Drogue” refueling,is especially preferred for UAVs due to its economy and flexibility.Therefore,more and more researchers are engaged in the development of‘Probe and Drogue” UAV autonomous aerial refueling(PD-UAV-AAR).9,10PD-UAV-AAR is the exclusive AAR method considered in this research.

Drogue detection,which is vitally important for estimating the relative position and attitude of the refueling drogue,is a fundamental issue during the docking phase of PD-UAVAAR.11–21Vision-based AAR technologies have been utilized to cope with this issue due to their high accuracy and independence.Though thereareseveralvision-based methods employed in the docking phase of PD-UAV-AAR,it is still challenging to develop a robust drogue detection method inreal-time because of the diversity of drogues available and the various environmental conditions in which AAR takes place.Most of the existing vision-based AAR approaches depend on utilizing artificial features such as light emitting diodes(LEDs)11–14and painted marks,15–17which require installation and also are susceptible to problems caused by occlusion.In Refs.11–14,LEDs were placed on a drogue as features to measure the position by visual detection.This does not avoid information loss in image projection and also increases the number of power-supplied wires on the drogue,which creates a potential hazard for PD-UAV-AAR.In Refs.15,16,a red-ring-shape feature using special materials with highly reflective properties was utilized for effective drogue detection requiring certain modifications be made to the drogue in advance.In Ref.17,a 3D Flash LIDAR camera was used to conduct AAR ground test demonstrations by detecting strong signals returning from highly reflective materials of the drogue without considering conditions of air turbulence.Therefore,it would be advantageous to directly detect drogues from AAR images during the docking phase of PD-UAV-AAR without utilizing such artificial features.18–21Based on the gray value and shape of the inner drogue,Ref.18utilized template matching and threshold segmentation to detect the drogue in real AAR data under fair environmental conditions without using artificial features.However,the templates cannot address all drogue conditions,and the experience threshold is hard to define due to drogue and environmental variation.Ref.19presents a monocular vision-based approach for AAR based on direct image registration using a complex robotic testbed and estimated drogue position under medium turbulence conditions,also without utilizing artificial features.However,Ref.19cannot avoid susceptibility to problems of drogue occlusion.Refs.20,21proposed a direct drogue detection strategy based on multiscale,low-rank and sparse decomposition through use of drogue images,and estimated drogue position under fair environmentalconditionswithoutfeatures.However,Refs.20,21reported that this method cannot detect drogues effectively in complex environmental situations such as in cloud,fog and light-interference conditions.

Fortunately,in recent years,deep learning with convolutional neural networks(CNNs)have surprisingly provided satisfactory solutionsto many problemssuch asimage recognition and object detection.22–24The region-based convolutional neural networks(R-CNN)25can achieve high object detection accuracy.However,the training of R-CNN is expensive in both time and space,and the detection speed is slow.Also,the networks of R-CNN,and its faster versions,26,27tend to be very large and have a high number of parameters,leading to low detection speed and the need for a large amount of labeled data to re-train the network after being pre-trained by ImageNet.The research in this paper aims to combine CNNs and a specific domain.The drogue detection for AAR based on CNNs can improve speed to meet the operational requirements of AAR.Many researchers have also been using Caffe,a deep learning framework utilizing GPUs for the advantages of expressive architecture,extensible code and high speed.28Thus,technologies utilizing a Caffe deep learning framework with CNNs have received much more attention by many research facilities.

Motivated by the above discussion,we propose a novel and effective method based on the Caffe deep learning framework with CNNs for real-time drogue detection during the docking phase of PD-UAV-AAR without artificial features,which both avoids being susceptible to problems caused by the occlusion of the drogue and realizes effective drogue detection under complex environmental situations.Firstly,in order to ensure the robustness of drogue diversity under changing environmental conditions,achieve optimal performance of drogue detection and avoid the potential dangers mentioned above,the proposed method directly uses real AAR images with the drogue extracted from ‘Probe and Drogue” aerial refueling videos not utilizing LEDs or other manually-made features.The ‘Probe and Drogue” aerial refueling videos contain diverse shape,size and occlusion of drogues under various conditions such as fair,cloudy,foggy and light-interference.An experiment is then conducted on the Caffe deep learning framework with CNNs by use of GPUs to test our proposed method on real ‘Probe and Drogue” aerial refueling data.Finally,the results of drogue detection and its accuracy analysis are discussed together with a comparison between the proposed method and competing methods.Reported competitive results for drogue detection are compared by using the deep learning idea.The experimental results on real AAR data show that the proposed method is effective in drogue detection in various situations,adding to the list of successful applications of deep learning methods.Our contributions can be summarized as follows.

(1)A novel framework for real-time drogue detection during the docking phase of PD-UAV-AAR that utilizes CNNs with GPUs without using manually-made artif icial features.

(2)A robust drogue detection method that can not only avoid being susceptible to problems caused by the occlusion of the drogue,but also performs effectively under complex environmental situations such as in cloud,fog and light-interference conditions.

(3)An efficient drogue detection method that can ensure highly accurate,robust and speedy performance.

2.System design

We present a method for drogue detection during the docking phase of PD-UAV-AAR based on a Caffe deep learning framework with CNNs that performs well without the need for setting artificial features on the drogue.A deep learning dataset of images for drogue detection,fundamental for the robustness and accurate performance of the system,was prepared in advance that contained diverse drogues with various environmental conditions.By use of deep learning and GPUs,an experiment was conducted on real data from ‘Probe and Drogue”aerial refueling using the Caffe deep learning framework with CNNs.Results from other competing methods for real-time drogue detection are depicted in various situations with a comparisons made between them and the proposed method.Fig.1 shows a flowchart of drogue detection for PD-UAV-AAR based on CNNs.

3.Methodology

3.1.Dataset preparation

In order to ensure robustness to for drogue diversity in various environmental conditions and achieve optimal performance,the proposed drogue detection method uses real AAR images of drogues extracted from ‘Probe and Drogue” aerial refueling videos and does not utilize LEDs or other hand-crafted features.The ‘Probe and Drogue” aerial refueling videos contain various environmental conditions such as fair,cloudy,foggy and light-interference and diverse shapes,sizes and occlusions of drogues.The prepared dataset contains four key components:(1)the original video collection of ‘Probe and Drogue”aerial refueling;(2)image frame extraction,selection and labeling;(3)image resizing(along with corresponding labels),filtering and augmentation;(4)data format conversion.A flowchart of dataset preparation for drogue detection based on CNNs is shown in Fig.2.

Detailed steps for data preparation are described as follows:

Step 1.Original video collection of ‘Probe and Drogue”aerial refueling

Due to the absence of an acknowledged dataset for drogue detection tasks,and considering the successful deployments of deep learning methods,the original collection of ‘Probe and Drogue”aerial refueling videos generally tends to play an important and fundamental role because more valuable data is better for deep learning.Therefore,as many videos of‘Probe and Drogue” aerial refueling were collected as possible that contain various situations such as fair,cloudy,foggy and light-interference conditions and diverse shapes,sizes and occlusions of drogues(the collected videos of ‘Probe and Drogue”aerial refueling can be found at http://pan.baidu.-com/s/1c0PPH3y).Note that the videos in refueling view,especially those during the close docking phase of ‘Probe and Drogue”aerial refueling,deserve further enlargement.

Step 2.Image frame extraction,selection and labeling

Image frames were extracted from ‘Probe and Drogue”aerial refueling videos through fast forward mpeg(ffmpeg),a multimedia video processing tool,on a Linux platform server.In practical deployments,memory space and time consumption should be considered in advance when there is a large amount of data to be dealt with.As a result,in order to reduce redundant information and save time spent on subsequent labeling,the images were extracted at 1 frame per second(FPS)in our experiment,which proved to be efficient and effective.Note that the proposed method can realize drogue detection at greater than 30 FPS by use of GPUs when drogue detection based on CNNs is conducted,which satisfies the realtime requirement of PD-UAV-AAR.

After extracting image frames,only images with the drogue were selected(the selected image frames can be found at http://pan.baidu.com/s/1c0PPH3y).The selected images were labeled using the MATLAB-based Annotation Tool,which employs the LabelMe MATLAB Toolbox developed at Massachusetts Institute of Technology(MIT).The drogue region was labeled using a box on extracted image frames that defines the location of the drogue’s four sides(top,bottom,left and right).The four coordinates of the box obtained in each image corresponds to the following illustration in Fig.3(ground truth labels of the image frames can be found at http://pan.baidu.com/s/1c0PPH3y).

Step 3.Image resizing(along with corresponding labels),filtering and augmentation

After labeling the drogue region using a box with four coordinates,for convenient processing and to meet the requirements of deep learning with CNNs,the RGB images were resized to 256×256 pixels uniformly along with their corresponding labels(coordinates).

Considering deep learning deployments and that the scope that we mainly focus on is the close docking phase of PDUAV-AAR,the resized images were then further filtered with a threshold of 20 pixels(threshold 1)for each labeled region’s size.After filtering,the sizes of the labeled regions were more than 20 pixels,with an average of 65 pixels.

In order to simulate the drogue movements caused by disturbances and simultaneously enlarge the dataset,the filtered images were augmented by image translation and rotation within 50 pixels(threshold 2)and 10 degrees(threshold 3),respectively.It is important to note that,on account of various disturbances,the drogue can meander in a disorderly manner by as much as its drogue diameter in the horizontal and vertical axes during the docking phase of PD-UAV-AAR,which leads to drogue shape and size variations in each frame.2

Note that the three thresholds utilized above in our experiment have also been obtained by a great many other experiments.We find that the proposed method can obtain satisfactory drogue detection during the docking phase of PD-UAV-AAR when the previously discussed thresholds are used in these experiments.Thus,in this paper,we use the previously discussed thresholds and experimental results validate effectiveness.

Step 4.Data format conversion

The augmented dataset,containing images with corresponding labels,is converted into hierarchical data format-5(HDF5),which will satisfy and be fed to the Caffe deep learning framework with CNNs.The HDF5 dataset has two parts:data(images)with a shape ofn×3×256×256,and labels(coordinates corresponding to images)with a shape ofn×8,wherenis the number of images in the HDF5 dataset.

The dataset information(found at http://pan.baidu.com/s/1c0PPH3y)includes 41 original videos of‘Probe and Drogue”aerial refueling,the list of original videos,the 1985 original image frames extracted from videos,and the ground truth labels of the original image frames.Note that after image resizing(along with corresponding labels),filtering and augmentation,the dataset contains more than 100000 images.

3.2.Model of drogue detection using Caffe deep learning framework with CNNs

3.2.1.Algorithm description

The model schema details model definitions,parameters and procedures for drogue detection on real AAR data.The proposed model for drogue detection is a Caffe deep learning framework with CNNs that comprises various layers such as HDF5 dataset,convolution,pooling,rectified linear unit(ReLU),inner product,dropout and Euclidean loss.As mentioned above,the Caffe deep learning framework has an expressive architecture whose models and optimizations are defined by configurations without hard-coding.The architecture of the proposed model for drogue detection based on CNNs is depicted in Fig.4.

The general architecture of a conventional CNN usually contains:(1)convolution and pooling appearing alternately;(2)convolution with weight sharing and a nonlinear activation function;(3)full connections;(4)a loss function.Note that the network architecture used in our paper has two specific aspects:First,the final fully-connected layer outputs 8 float numbers corresponding to the coordinates of the 4 corners of the drogue location defined by a box.Second,the loss function is defined by the Euclidean distance between the detection results and the ground truth labels.Thus,the proposed architecture can ensure fast drogue detection speed.As shown in Fig.4,the input images are made convolved with filters on the convolutional layer;then,the resulting feature maps are downsampled on the pooling layer;finally,the inner product layer(also known as the fully-connected layer)outputs the features used to realize drogue detection.The specifics of the algorithm are described as follows.

Firstly,the input images are entered into convolutional operations with filters,and by use of the activation function we get the layer’s output features:

Secondly,after obtaining feature maps,and in order to reduce network parameters,computation and overfitting,we use max pooling down-sampling strategy(choosing the maximum value in the kernel size of pooling on feature maps as the output)and dropout strategy(dropping the nodes out of the networks with the dropout ratio at each training stage and the reduced network being trained in that stage).These both decrease overfitting in neural nets and improve the speed of training,leading to the learning of more robust features that can better generalize new data.

Next,after max pooling down-sampling and dropout strategy,the CNN model puts all the feature maps into the inner product layer and gets the predicted drogue location of the model(the drogue detection results).Putting the drogue detection results into the Euclidean loss layer,we obtain the drogue detection error based on the detection results and ground truth labels.The Euclidean loss is defined as

where loss is the function of Euclidean loss for all testing images,nis the number of testing images,xijand^xijare the ground truth label and predicted label of thexcoordinate in each testing image,andyijand^yijare the ground truth label and predicted label of theycoordinate in each testing image.The drogue detection testing was conducted until the loss converges to a satisfactory level during the training phase of the proposed model.

Finally,the parameters ofW(the network connection weight)andb(the bias value)are tuned by back-propagation algorithm,and the target of the training is to obtain the optimalWandb,which can be used to obtain optimal drogue detection results.

3.2.2.Features of the proposed model’s architecture

(1)HDF5 dataset layer

The images(data shown in Fig.4)and corresponding coordinates(label shown in Fig.4)are fed into Caffe in HDF5 format from the HDF5 dataset layer,which is the first layer of architecture in the model for drogue detection based on CNNs.The HDF5 dataset layer parameters,source and batch size,are defined as follows:

(A)Source–the directory of the train/test dataset in HDF5 format.

(B)Batch size–the number of images to be processed at each time.In our experiment,we set the batch size to 100.

(2)Convolution layer

Convolution is one of the most important concepts in deep learning;It is convolution and convolutional neural networks that boost deep learning.The convolution layer parameters of lr_mult,num_output,pad,kernel size,stride,weight filter and bias filter are defined as follows:

(A)Lr_mult–Lr_mult is short for learning rate multiplier;Lr_mult 1&2 mean learning rate multiplier for filters and biases,respectively.

(B)Num_output–the number of learned filters.

(C)Kernel size–the size of filters.

(D)Stride–the step of filter applications.

(E)Weight filter–the filters are initialized by the xavier algorithm,29which can automatically determine the scale of initialization based on the number of input and output neurons.

(F)Bias filter–the biases are initialized to a constant value.

(3)ReLU layer

ReLU has been widely used and is the most popular activation function for deep learning due to its biologically plausible,sparse activation and efficient gradient propagation,which allows for effective training of deep learning architectures.30In experiments,the ReLU layer supports in-place computing to avoid additional memory consumption.

(4)Pooling layer

The function of the pooling layer in deep learning is feature integration or non-linear downsampling.The output of the pooling layer is the feature map after pooling.The pooling layer parameters of pool,kernel size and stride are defined as follows:

(A)Pool–the method of pooling utilized in deep learning.In our experiment,we use max pooling method,31which means choosing the maximum value in the kernel size of pooling on feature maps as the output.

(B)Kernel size–the size of filters.

(C)Stride–the step of filter applications.

(5)Inner product layer

The inner product layer is also known as the fullyconnected layer,computing an inner product with learned weights and biases.The inner product layer parameters of lr_-mult,num_output,weight filter and bias filter are defined as follows:

(A)Lr_mult–Lr_mult 1&2 mean learning rate multiplier for filters and biases,respectively.

(B)Num_output–the number of learned filters.

(C)Weight filter–the parameter initializing method utilized in inner product layer is Gaussian with a certain standard deviation(std).

(D)Bias filter–the bias filters are initialized to a constant value.

(6)Dropout layer

Dropout is one of the regularization methods that can not only decrease overfitting but also improve the training speed in deep learning networks.32The dropout technique is beneficial for learning more robust features,which can improve the performance of deep learning in various tasks.In our experiment,we set the dropout ratio as 0.2.

(7)Euclidean loss layer

The Euclidean loss layer is the last layer of the model for drogue detection via deep learning with CNNs.The Euclidean loss layer specifies how the deep learning network penalizes the deviation between predicted labels and ground truth labels,and is used for regressing to ground truth labels.

4.Experiments and discussion

In this section,experiments are conducted to demonstrate the effectiveness of the proposed method.Moreover,the results and accuracy analysis of drogue detection are discussed together with a comparison between the proposed method and competing methods.This will validate that a drogue can be detected with high accuracy,robustness,and speed.

4.1.Experimental environment

An experiment of drogue detection was conducted on real AAR data – real images extracted from ‘Probe and Drogue”aerial refueling videos using drogues without artificial features such as LEDs or painted marks.The ‘Probe and Drogue” aerial refueling videos contain various situations such as fair,cloudy,foggy and light-interference conditions,and diverse shapes,sizes and occlusions of drogues.Note that the real videos of‘Probe and Drogue” aerial refueling already contain real disturbances,which illustrates that real disturbances are taken into consideration when conducting the experiments.

The training dataset contains the drogue images from both the refueling view and non-refueling view.The testing dataset contains drogue images of only the refueling view and especially of those taken during the close docking phase of AAR.Note that the images of testing dataset are different from those of training dataset without loss of generality.In the experiment,we adapt the cross validation strategy to estimate performance of the proposed method.

The algorithm is implemented with python programming language on a server computer with an environment of Ubuntu 14.04.1 LTS(GNU/Linux 3.13.0-32-generic x86_64),Intel Xeon E5-2620 v2 six-core processor 2.1 GHz CPUs,and NVIDIA GeForce GTX TIT GPUs.

4.2.Experimental results

The experiment was conducted on a Caffe deep learning framework with CNNs by utilizing GPUs to test our proposed method on real data from ‘Probe and Drogue” aerial refueling.The results of drogue detection under various conditions are shown in Figs.5–7.In these figures,the green dashed box and red dashed box denote the ground truth label and predicted label,respectively.Fig.5 shows the results of drogue detection under fair environmental conditions with drogues of diverse shape,size and occlusion.In Fig.5,we can see the proposed drogue detection method based on deep learning has good performance with high accuracy.Fig.6 shows the results of drogue detection under cloudy and foggy environmental conditions with drogues of different shape,size and occlusion.In Fig.6,although the environmental conditions are not ideal,the accuracy and performance of drogue detection is still good,though slightly inferior to that of Fig.5.Fig.7 shows the results ofdrogue detection under light-interference conditions with drogues of different shape,size and occlusion.In Fig.7,despite the difficult environmental conditions,the accuracy and performance of drogue detection is similar to that of Fig.6 and slightly inferior to that of Fig.5.Figs.5–7 show that the proposed drogue detection method based on CNNs has performs well with accuracy and robustness under various environmental conditions.

Forthe task ofdrogue detection duringthe docking phaseof autonomousaerialrefueling,theimageobtainedbyavisionsensor mounted on the receiver airplane/UAV usually contains the drogue,and parts of the aircraft and sky with drogue diversity under various environmental conditions.The background(imageregionoutsidethedrogue)issimpletosomeextent(compared with situations where the detection tasks utilize R-CNN versions).Therefore,drogue detection does not require a large network(such as R-CNN)or a large number of labels,and can function effectively with only a small network(as proposed in our paper)and relatively less labels(our chosen dataset)that cancovermostbasicsituations;therefore,theproposedmethod works in different cases/environments,and the detection speed is fast enough to satisfy with the requirements of autonomous aerial refueling.For images with a resolution of 256×256,the time cost of our algorithm using GPUs is better than 30 FPS,which is suitable for real-time drogue detection applications during the docking phase of PD-UAV-AAR.Competitive resultsondroguedetectionhavebeenreportedbyusingthedeep learning idea.The experimental results on real AAR data show that the proposed method is effective in real-time drogue detection under various conditions,adding to the list of successful applications of deep learning methods.

4.3.Accuracy analysis of drogue detection

To quantitatively evaluate whether the precision of drogue detection satisfies the PD-UAV-AAR system requirements for high accuracy,the statistical criteria of precision,recall,F-measure and mean absolute error(MAE)in pixels have been applied and are defined as follows:

(1)Precision

wherePis the average precision for all testing images,nis the number of testing images,Piis the precision for each testing image,TPiis the number of true positive pixels in each testing image,and FPiis the number of false positive pixels in each testing image.The higher the value of precision,the better the performance of drogue detection.

(2)Recall

whereRis the average recall for all testing images,Riis the recall for each testing image,FNiis the number of false negative pixels in each testing image,and PNiis the number of ground truth positive pixels in each testing image.The higher the value of recall,the better the performance of drogue detection.

(3)F-measure

whereFis the averageF-measure for all testing images,Fiis the F-measure for each testing image.We set ρ=0.3,which is the same as in Ref.33The higher the value ofF-measure,the better the performance of drogue detection.

(4)MAEx(in pixels)

(5)MAEy(in pixels)

where MAEy(in pixels)is the MAE inyaxis for all testing images,nis the number of testing images,|Δyi|is the mean absolute error inyaxis for each testing image,yijis the ground truth label of theycoordinate in each testing image,y^ijis the predicted label of theycoordinate in each testing image.The lower the value of MAEy,the better the performance of drogue detection.

Evaluations of drogue detection performance based on the Caffe deep learning framework with CNNs are shown in Figs.8–10 according to the statistical criteria defined above.The Euclidean loss of the proposed model for drogue detection based on CNNs is depicted in Fig.8 and indicates the loss converges to a stable and reasonable scope while the number of images contained in the training dataset increases from 5000 to 100000.Thus,the convergence of loss proves the correctness of our model for drogue detection using the Caffe deep learning framework with CNNs.The statistical criteria of precision,recall and F-measure for drogue detection performance based on CNNs are illustrated in Fig.9.It can be seen that the values ofP,RandFincrease gradually to an acceptable level when the number of images contained in training dataset surpasses 50,000.The MAE(in pixels)of drogue detection performance based on CNNs is described in Fig.10 with MAEx(in pixels)and MAEy(in pixels).As we can see from the figure,both MAEx(in pixels)and MAEy(in pixels)are superior to 5 pixels when the number of images contained in the training dataset is more than 50000.The accuracy analysis of drogue detection suggests that the proposed method can satisfy the requirements of stabilization,precision and real-time performance for drogue detection during the docking phase of PD-UAVAAR.

To further evaluate the performance of our proposed method,we compare it with other works17–21in Table 1 whereperformance analyses among different drogue detection methods are reported.As Table 1 shows,artificial features are utilized in Chen et al.’s method,17but not in Yin et al.’s method,18Martı´nez et al.’s method,19Song et al.’s method,20Gao et al.’s method21or our proposed method.The Chen et al.method17utilizing artificial features employs highly reflective materials on the drogue for ground tests with a simplified evaluation not considering turbulence conditions;this method does not provide quantitative results of drogue detection.17For the methods not utilizing artificial features,Yin et al.’s method18detects the drogue in real AAR data under fair environmental conditions and obtains more than 80%correct locations;Martı´nez et al.’s method19estimates the drogue position with MAE less than 10 pixels on a complex robotic testbed under medium turbulence conditions;Song et al.’s method20estimates the drogue position with precision larger than 0.8 on real AAR data under fair environmentalconditions;Gao et al.’s method21estimates the drogue position with an F-measure larger than 0.9 on real AAR data under fair environmental conditions.Similar to Refs.18–21,by utilizing real AAR data instead of artificial features,we propose a novel and effective method based on CNNs for drogue detection during the docking phase of PD-UAV-AAR under various environmental conditions, and achieve the following drogue detection performance–MAE(in pixels)less than 3.5 pixels,precision larger than 0.9,andF-measure larger than 0.9 when the training dataset contains 100,000 training images;this is superior to the competing methods mentioned above.In addition,the proposed method not only avoids being susceptible to problems caused by occlusion of the drogue,but also realizes drogue detection effectively under complex environmental conditions such as cloud,fog and light-interference.Thus,our proposed method has better performance than the competing approaches to drogue detection,proving the validity of the deep learning method based on CNNs for drogue detection during the docking phase of PD-UAV-AAR.

Table 1 Performance comparison between drogue detection methods.

CNNs are time-consuming in the training stage,and the labeling ofimagesisalso time-consuming.Thetimeconsumption of CNNs usually relates to tasks such as scene labeling and general object detection,which need to sample thousands of regions for(feedforward)computing and lead to an inefficient solution involving multiple feedforward computations.However,the drogue detection for AAR based on CNNs is a specific object detection task using regression,and one image contains at most only one drogue that needs only one(feedforward)computation instead of thousands.According to our proposed drogue detection method for AAR based on CNNs,the drogue detection time is 33 FPS by use of CPUs(Intel Xeon E5-2620 v2 six-core processor 2.1 GHz CPUs),and 236 FPS by use of GPUs(NVIDIA GeForce GTX TIT GPUs).Thus,the proposed drogue detection method based on CNNs is suitable for AAR and can satisfy the real-time requirement of PD-UAV-AAR.More comparative results(including time comparison)with other methods are shown in Table 1.

4.4.Further validation under the same drogue image frames

To further validate our proposed method,we compare it with the Martı´nez et al.’s method19that can detect a drogue at more than 30 FPS with a robust position estimation even under medium turbulence conditions.This is done under the same drogue image frames with drogue detection results and F-measure comparison of drogue detection performance provided.The results of drogue detection based on Martı´nez et al.’s method19under the same drogue image frames are shown in Figs.11–13.Note that the green dashed box and red solid box in Figs.11–13 denote the ground truth label and detected result,respectively.According to Figs.5–7 and Figs.11–13,the F-measure comparison of drogue detection performance(under the same drogue image frames)is calculated,as shown in Fig.14.The drogue detection results(Figs.5–7 and Figs.11–13)and F-measure comparison of drogue detection performance(Fig.14)both demonstrate that the proposed method can obtain real-time drogue detection under various environmental conditions with high accuracy and be robust to drogue occlusions and complex environmental situations.These are all indications of our proposed method’s effectiveness.

5.Conclusions

(1)The proposed method is able to accomplish real-time drogue detection via deep learning with CNNs by utilizing GPUs during the docking phase of PD-UAV-AAR without artificial features.Experimental results on real AAR data show that the proposed method is accurate,robust,fast and able to detect a drogue regardless of occlusion or complex environmental conditions.

(2)A dataset based on real data of ‘Probe and Drogue”aerial refueling,containing various environmental situations with diverse drogues without manually-made features,was prepared and used in the proposed method,which was fundamental for the robustness and accuracy of system performance.

(3)A model utilizing the architecture of the Caffe deep learning framework with CNNs for real-time drogue detection is proposed.The experimental results on real AAR data show that the proposed method is effective in drogue detection and can meet the requirements of PD-UAV-AAR,adding to the list of successful applications of deep learning methods.

(4)The proposed method has great potential for real environmental drogue detection using deep learning,which can avoid the placement of LEDs or painted features,and be competitive and attractive in the practical application of PD-UAV-AAR.

(5)Due to the fundamental importance of the dataset and the ongoing development of deep learning methods,the performance of drogue detection can be further improved by combining more valuable real AAR data with intelligent deep learning improvements,which deserve further research.

Acknowledgments

This work was co-supported by the National Basic Research Program of China(Nos.2012CB316301,2013CB329403)and the National Natural Science Foundation of China(Nos.61473307,61304120,61273023,61332007).

1.Nalepka JP,Hinchman JL.Automated aerial refueling:extending the effectiveness of unmanned air vehicles.Reston:AIAA;2005.Report No.:AIAA-2005-6005.

2.Hansen JL,Murray JE,Campos NV.The NASA Dryden AAR project:a flight test approach to an aerial refueling system.Reston:AIAA;2004.Report No.:AIAA-2004-4939.

3.Campa G,Napolitano MR,Perhinschi M,Fravolini ML,Pollini M,Mammarella M.Addressing pose estimation issues for machine vision based UAV autonomous serial refuelling.Aeronaut J2007;111(1120):389–96.

4.Dong XM,Xu YJ,Chen B.Progress and challenges in automatic aerial refueling.J Air Force Eng Univ Nat Sci Edn2008;9(6):1–5[Chinese].

5.Ding M,Wei L,Wang BF.Vision-based estimation of relative pose in autonomous aerial refueling.Chin J Aeronaut2011;24(6):807–15.

6.Mu CD,Li BR.Vision-based autonomous aerial refueling.J Tsinghua Univ Sci Tech2012;52(5):670–6[Chinese].

7.Wang HT,Dong XM,Xue JP,Liu JL.Dynamic modeling of a hose-drogue aerial refueling system and integral sliding mode backstepping control for the hose whipping phenomenon.Chin J Aeronaut2014;27(4):930–46.

8.Lu YP,Yang CX,Liu YY.A survey of modeling and control technologies for aerial refueling system.Acta Aeronaut Astronaut Sin2014;35(9):2375–89[Chinese].

9.Chen CI,Stettner R.Drogue tracking using 3D flash lidar for autonomous aerial refueling.Proceedings of SPIE 8037,laser radar technology and applications XVI;2011 Jun 8.Orlando,USA.Bellingham:SPIE;2011.p.1–11.

10.Martı´nez C,Richardson T,Campoy P.Towards autonomous airto-air refuelling for UAVs using visual information.Proceedings of IEEE international conference on robotics and automation;2013 May 6–10.Karlsruhe,Germany.Piscataway(NJ):IEEE Press;2013.p.5736–42.

11.Pollini L,Campa G,Giulietti F,Innocenti M.Virtual simulation set-up for UAVs aerial refueling.Reston:AIAA.2003.Report No.:AIAA-2003-5682.

12.Mati R,Pollini L,Lunghi A,Innocenti M,Campa G.Visionbased autonomous probe and drogue aerial refuelingProceedings of IEEE 14th mediterranean conference on control and automation;2006 Jun 28–30.Ancona,Italy.Piscataway(NJ):IEEE Press;2006.p.1–6.

13.Pollini L,Mati R,Innocenti M,Campa G,Napolitano M.A synthetic environment for simulation of vision-based formation flight.Reston:AIAA;2003.Report No.:AIAA-2003-5376.

14.Xie HW,Wang HL.Binocular vision-based short-range navigation method for autonomous aerial refueling.J Beijing Univ Aeronaut Astronaut2011;37(2):206–9[Chinese].

15.Wang XF,Dong XM,Kong XW.Feature recognition and tracking of aircraft tanker and refueling drogue for UAV aerial refueling.Proceedings of 25th IEEE Chinese control and decision conference;2013 May 25–27.Guiyang,China.Piscataway(NJ):IEEE Press;2013.p.2057–62.

16.Wang XF,Kong XW,Zhi JH,Chen Y,Dong XM.Real-time drogue recognition and 3D locating for UAV autonomous aerial refueling based on monocular machine vision.Chin J Aeronaut2015;28(6):1667–75.

17.Chen CI,Koseluk R,Buchanan C,Duerner A,Jeppesen B,Laux H.Autonomous aerial refueling ground test demonstration:a sensor-in-the-loop, non-tracking method.Sensors2015;15(5):10948–72.

18.Yin YJ,Xu D,Wang XG,Bai MR.Detection and tracking strategies forautonomousaerial refuelling tasks based on monocular vision.Int J Adv Robot Syst2014;11(1):399–412.

19.Martı´nez C,Richardson T,Thomas P,Luke du Bois J,Campoy P.A vision-based strategy for autonomous aerial refueling tasks.Robot Auton Syst2013;61(8):876–95.

20.Song CH,Gao SB,Cheng YM.Drogue detection algorithm in visual navigation system for autonomous aerial refueling.Infrared Laser Eng2013;42(4):1089–94[Chinese].

21.Gao SB,Cheng YM,Song CH.Drogue detection for vision-based autonomous aerial refueling via low rank and sparse decomposition with multiple features.Infrared Phys Technol2013;60:266–74.

22.LeCun Y,Bengio Y,Hinton G.Deep learning.Nature2015;521(7553):436–44.

23.Krizhevsky A,Sutskever I,Hinton GE.ImageNet classification with deep convolutional neural networks.Adv Neural Inform Process Syst2012;25:1106–14.

24.Russakovsky O,Deng J,Su H,Krause J,Satheesh S,Ma S,et al.ImageNet large scale visual recognition challenge.Int J Comput Vis2015;115(3):211–52.

25.Girshick R,Donahue J,Darrell T,Malik J.Rich feature hierarchies for accurate object detection and semantic segmentation.Proceedings of IEEE conference on computer vision and pattern recognition;2014 June 23–28.Columbus,USA.Piscataway(NJ):IEEE Press;2014.p.580–7.

26.Girshick R.Fast R-CNN.Proceedings of IEEE international conference on computer vision;2015 December 11–18;Santiago,Chile.Piscataway(NJ):IEEE Press;2015.p.1–9.

27.Ren SQ,He KM,Girshick R,Sun J.Faster R-CNN:Towards real-time object detection with region proposal networks.Proceedings of neural information processing systems;2015 December 5–10;Montreal,Canada;2015.p.1–14.

28.Jia YQ,Shelhamer E,Donahue J,Karayev S,Long J,Girshick R,et al.Caffe:convolutional architecture for fast feature embedding.Proceedings of 22nd ACM international conference on multimedia;2014 November 3–7.Orlando,USA.New York:ACM;2014.p.1–4.

29.Glorot X,Bengio Y.Understanding the difficulty of training deep feedforward neural networks.J Mach Learn Res2010;9:249–56.

30.Glorot X,Bordes A,Bengio Y.Deep sparse rectifier neural networks.Proceedings of 14th International conference on artificial intelligence and statistics.2010.p.15–23.

31.Boureau YL,Bach F,LeCun Y,Ponce J.Learning mid-level features for recognition.Proceedings of IEEE conference on computer vision and pattern recognition;2010 June 13–18.San Francisco,USA.Piscataway,NJ:IEEE Press;2010.p.2559–66.

32.Srivastava N,Hinton G,Krizhevsky A,Sutskever I,Salakhutdinov R.Dropout:A simple way to prevent neural networks from overfitting.J Mach Learn Res2014;15(1):1929–58.

33.Cheng MM,Mitra NJ,Huang XL,Torr PHS,Hu SM.Global contrast based salient region detection.IEEE Trans Pattern Anal Mach Intell2015;37(3):569–82.

2 March 2016;revised 24 June 2016;accepted 18 July 2016

Available online 21 December 2016

Ⓒ2016 Chinese Society of Aeronautics and Astronautics.Production and hosting by Elsevier Ltd.This is anopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).

*Corresponding author.

E-mail addresses:wangxf6361@126.com(X.Wang),dongxinmin@139.com(X.Dong),jk973@qq.com(X.Kong),lijianmin@mail.tsinghua.edu.cn(J.Li),dcszb@mail.tsinghua.edu.cn(B.Zhang).

Peer review under responsibility of Editorial Committee of CJA.