Zhan Ziqi,Liu Bing,Huo Bin(Dongfeng Nissan Technical Center,Guangzhou 510800)
【Abstract】In this research,a complete process of engineering implementation for Traffic Sign Recognition(TSR)was established based on RGB(Red,Green,Blue)to HSV(Hue,Saturation,Value)model and Convolutional Neural Network(CNN).In order to improve operational speed,identification of dynamic Region Of Interest(ROI)was optimized,method of image changing from RGB to HSV model was optimized,and neural network structure was designed.TSR algorithm was verified with GTSRB database.The result shows that the proposed TSR method improves computation speed and recognition rate effectively.
Key words:Traffic sign recognition,HSV model,CNN,Autonomous driving
Traffic Sign Recognition(TSR)has been an important perception task for autonomous driving system.Reliability and computing speed are regarded as two most important parameters for recognition tasks[1-2].Both of traditional and Neural Network(NN)methods are widely analyzed for TSR algorithm[3-6].A traditional method usually uses expert model including edge detection,shape recognition,content match,etc.,which means every step of recognition algorithm is formulated.NN method is usually regarded as an end-to-end method,which means explanation of these algorithms will take a lot of time,especially during detection process[7-9].
Many programs had proposed methods with combination of expert model and neural network.In this paper,a combination of optimized expert model and Convolutional Neural Network(CNN)are used.And a complete process of engineering implementation for the recognition process for Chinese traffic signs are introduced.
In this paper,only traffic signs with red color are considered as recognition targets,because red sign represents forbid which includes rate-limiting,no passing,no turning left,etc.In actual driving mission,this traffic sign will give a direct instruction to the driving condition.
Essential steps include ROI identification,traffics sign area extraction,traffic sign recognition.All steps in Figure 1 are needed for the whole recognition process.
Two main parts are included in Figure 1:
a.Dynamic ROI detection.This part deals with a frame of video recorded by camera,and output of 32 pixel×32 pixel image that only includes traffic sign.
b.Traffic sign recognition.A CNN with 18 layers is designed for training,which gives a result of recognition.
Figure 1.Flow of TSR process
In each second,only one frame image in video will be taken for the detection and recognition process,considering TSR process is a relative slow comparing with obstacle recognition process or other processes related to autonomous driving.
Converting the image from RGB(Red,Green,Blue)to HSV(Hue,Saturation,Value)model makes it easier to extract red zones from the original image accurately.Then traffic signs are located in these red zones,and the image is turned into binary.In the binary format image,by eroding and dilating method the connected zones are obtained and traffic sign coordinate is set up,with which the ROI zone was extracted.The whole results of dynamic ROI detection process are shown in Figure 2.
Figure 2.Dynamic ROI detection process
Computer vision algorithms used on image process are usually straightforward extensions to algorithms,each color component is separately used as input of the algorithm.HSV color model has advantage over RGB color model.However,traditional method to transfer an image from RGB to HSV model calculates every single hue,which consumes more time during transition.The proposed transfer method reduces calculate process and therefore takes less computation time.It can be expressed as:
In which,r,g,brepresents red,green,blue in RGB color model separately.Result is shown in Figure 2b.
3.2.1 Binaryzation
Image color is redistributed after binary conversion by the degrees of hue in HSV model,which means 0 equals to 0°,and 1 equals to 360°in HSV model.Result is shown in Figure 2c.
3.2.2 Color extraction
A threshold is needed for red color extraction in binaryimage,herethethresholdvaluet∈[0.0277,0.0320].Result is shown in Figure 2d.One thing needs to point out is,threshold valuetis a value based on experience and in most tests could get a fine extraction result.A better threshold value needs more tests in real environment.Color withintis set as white and others are as black in Figure 2d.3.2.3 Erosion and dilation
Nosuch chaptert,a small part of pixel satisfies the threshold,therefore an erosion process was added after the extraction process.During erosion process,a disk area with radius of 10 pixels is used.Result shows in Figure 2e.
To improve the weight of target area,dilation process is added after erosion so that the traffic sign has a clear red circle on the outside edge,which means all pixels inside the circle will be set as the same value to the extraction area.Result is shown in Figure 2f.
3.3.1 Extraction
There are still many eligible areas after target areas extraction as Figure 2f shows.In this section,ranking for these alternative areas is needed.The ranking method is shown in Figure 3.
Figure 3.The ranking method of ROI area selection
In Figure 3,if area of an alternative less than 10%or more than 50%,it will be seen as an interferent such as a red coke bottle on the road or a red building nearby the road.A RGB model image is extracted from the ROI area.Result of extraction is shown in Figure 2f.
3.3.2 Resizing
Result of ROI extraction is the input of CNN recognizing process.Resizing image to 32 pixel×32 pixel is the last step.Result is shown in Figure 2g.
CNN uses a variation of multilayer perceptions designed to require minimal preprocessing that has successfully been applied to analyzing visual imagery.CNNs use relatively little pre-processing compared to other image classification algorithms.This means that the network learns the filters that in traditional algorithms were hand-engineered[10].In this paper,a structure of neural network is used for the recognition process,however it is not a research focus during this engineering implementation work.
A 13 layers’network was designed for the recognizing process.
4.1.1 Input layer
Receiving 32 pixel×32 pixel×3 pixel RBG image as input.
4.1.2 Middle layers
Middle layers include 8 layers with repeating convolution layer(C)and max pooling layer(S):
a.Convolution:32 5×5 convolutions with stride[1,1]and padding[2,2].
b.Max Pooling:3×3 max pooling with stride[2,2]and padding[0,0].
c.Convolution:32 5×5 convolutions with stride[1,1]and padding[2,2].
d.Max Pooling:3×3 max pooling with stride[2,2]and padding[0,0].
e.Convolution:64 5×5 convolutions with stride[1,1]and padding[2,2].
f.Max Pooling:3×3 max pooling with stride[2,2]and padding[0,0].
g.Convolution:64 5×5 convolutions with stride[1,1]and padding[2,2].
h.Max Pooling:3×3 max pooling with stride[2,2]and padding[0,0].
4.1.3 Output layers
Output layers include 4 layer arrays,
a.Fully Connected:128 fully connected layers.
b.Fully Connected:4 fully connected layers.
c.Softmax.
d.Classification Output.
The structure of CNN is shown in Figure 4.
Two kinds of dataset are used for the training process.One is GTSRB(German Traffic Sign Recognition Benchmark),each traffic sign contains 2 000 samples,in which 80%for training and 20%for testing.These data is used during test stage in lab,to verify the basic performance and reliability of the network.Another dataset is recorded from the real road test,to verify the performance of the whole algorithm considering real driving environment(including Chinese traffic signs,hardware of camera system,weather condition,etc.).
Figure.4 Structure of network
4.3.1 Test with GTSRB dataset
Test with GTSRB dataset is done under static mode,parameters set in training process and test result are shown in Table 1.Comprehensive recognition rate reaches 99.6%in static mode.
Table 1.Results in static mode test with GTSRB dataset
4.3.2 Test with real road dataset
Test with real road dataset obtains good recognition rate in good weather conditions.But two things remain to be improved:
a.Under bad light environment,due to limitation of camera hardware,original images under bad light environment such as crossing the portal could be very hard for target detection.
b.Under background with red color,it will be selected as ROI together with traffic when the red background overlaps with traffic sign.
This paper shows a complete calculation process of how to detect traffic signs and input them to CNN.In the next step of work,it is worth exploring how to determine the size of the image input into CNN,because it may be helpful to further improve the computation speed.
Only color based target detection is limited by color itself,camera performance,or light environment.In the future study,methods of combined color,shape and other methods will be analyzed.