Detecting Iris Liveness with Batch Normalized Convolutional Neural Network

2019-02-28 07:08MinLongandYanZeng
Computers Materials&Continua 2019年2期

Min Long and Yan Zeng

Abstract: Aim to countermeasure the presentation attack for iris recognition system, an iris liveness detection scheme based on batch normalized convolutional neural network(BNCNN) is proposed to improve the reliability of the iris authentication system. The BNCNN architecture with eighteen layers is constructed to detect the genuine iris and fake iris, including convolutional layer, batch-normalized (BN) layer, Relu layer, pooling layer and full connected layer. The iris image is first preprocessed by iris segmentation and is normalized to 256×256 pixels, and then the iris features are extracted by BNCNN.With these features, the genuine iris and fake iris are determined by the decision-making layer. Batch normalization technique is used in BNCNN to avoid the problem of over fitting and gradient disappearing during training. Extensive experiments are conducted on three classical databases: the CASIA Iris Lamp database, the CASIA Iris Syn database and Ndcontact database. The results show that the proposed method can effectively extract micro texture features of the iris, and achieve higher detection accuracy compared with some typical iris liveness detection methods.

Keywords: Iris liveness detection, batch normalization, convolutional neural network,biometric feature recognition.

1 Introduction

Currently, biometric recognition has received extensive attention. Human iris, an important biological feature can be used to determine the identity of a target due to its rich texture characteristics, very high uniqueness and stability [Chen, Shen and Chen(2016)]. Instead of remembering difficult passwords or carrying extra cards in the traditional authentication systems, users only need to focus their eyes toward cameras without touching the equipment.

Although iris recognition brought convenience and security to our daily life, it is still vulnerable to fake iris attacks. The most common way is to present an iris picture of a legitimate encrollee either by printing a photo or by displaying a photo using electronic screens. Currently, contact lenses, printed iris pictures, LCD iris, video iris, synthetic iris,glass silica gel and other forged eyes were successfully used to deceive iris recognition system. The existing attacks include masquerade attack and print attack (or spoofing).They belong to user level attack, and they happened at the sensor level where an impostor tried to present the fake data and authenticate himself. Masquerade attack is that the user deliberatly changes his biometric trait to spoof his identity by excessively dilating the iris or by wearing contack lenses. The imposer uses masquerade attack to conceal his genuine identity or to pose as someone else. Print attack (or spoofing) is also called as fake iris attack. The imposer presents a fake iris by using printed picture, video iris, synthetic iris etc. This attack can affect the enrollment and the verification phase of a biometric recognition system [Gupta and Sehagal (2016)]. Since a person’s iris cannot be reset when it is stolen by an impostor, it is of great significance to detect these threats of the iris recognition system and provide effective measures to improve the system security.Although some works have been done on iris liveness detection, the performance is still far from satisfactory. In this paper, we make an attempt to use batch normalized convolutional neural network to detect iris liveness and to improve the detection accuracy.The rest of the paper is organized as follows. Section 2 introduces some related work. In Section 3, the proposed scheme is described. Section 4 provides experimental results and analysis. Finally, some conclusions are drawn in Section 5.

2 Related work

Nowadays, a variety of methods have been proposed to prevent spoofing attacks. Since fake iris patterns printed on contact lens, paper, plastic plates and glasses will generate special high frequency information, and the real images have different texture with synthetic image [Peng, Zhou, Long et al. (2017)], therefore, texture analysis and pattern classification methods can be intuitively used for iris liveness detection. Daugman et al.used Fast Fouier Transform (FFT) to discrimate the printed iris pattern by detecting the high frequency spectral magnitude in the frequency domain [Daugman (2003)]. It can effectively detect the spoof by using the printed iris image with cutting off the printed pupil region and seeing through by attacker’s eye. He et al. proposed an approach for detecting fake iris based on the analysis of 2D Fourier spectra together with iris image quality assessment [He, Lu and Shi (2008)]. Image quality assessment is first used to exclude the defocused and motion blurred fake iris, then statistical properties of Fourier spectra for fake iris are used to detect clear fake iris. The use of image quality assessment significantly improves the performance compared with Daugman’s method. Zhang et al.proposed a fake iris detection method based on weighted local binary patterns (LBP) and statistical features, and support vector machine (SVM) was used to classify the genuine and fake irises [Zhang, Sun and Tan (2010)]. In Gragnaniello et al. [Gragnaniello,Sansone and Verdoliva (2015)], a technique to detect printed iris attacks based on the local binary pattern (LBP) descriptor was proposed. The calculation of LBP is performed on a high-pass version of the image with 3×3 integer kernel, and the classification is accomplished by SVM. In [Peng, Qin and Long (2018)], GS-LBP and LGBP were also used to investigate the distinct difference between the real image and artefact. In Raghavendra et al. [Raghavendra and Busch (2015)], a presentation attack detection(PAD) scheme was proposed to identify the iris spoof. Multi-scale binarized statistical image features (M-BSIF) are extracted to represent the micro-texture variations from multiple scales at both feature and decision level. In Sun et al. [Sun, Zhang, Tan et al.(2014)], an image feature representation method named as Hierarchical Visual Codebook(HVC) was developed to classify an iris image to an application specific category. The static characteristics such as iris frequency or texture are extracted, and they are classified by SVM. Since there is no strict definition for the texture models of genuine and fake iris images, and there are various types of fake iris patterns, it is difficult to handle all spoofing occasions with the textural features [He, Li, Liu et al. (2016)].

Another kind of iris liveness detection is named as dynamic method. It extracts evidence from the spatial characteristics of iris. In Lee et al. [Lee, You and Kang (2008)], the genuine and fake iris were identified by comparing the location and distance of the human eye’s 3D image model with the location of the tester. In Czajka et al. [Czajka(2015); Thavalengal, Nedelcu, Bigioi et al. (2016)], the iris was detected by analyzing pupil changes. Such dynamic methods can obtain high detection accuracy, but the speed is slow, and the computational cost is high. Moreover, the dynamic features are variable across different populations, and there exist more subtle changes in pupil size for elderly people, which would decrease the detection accuracy.

In recent years, convolutional neural networks show significant advantages in feature extraction and feature recognition [He, Li, Liu et al. (2016); Wu, Wang and Zhang (2017);Jiang, Zhao and Wu (2016)]. In Silva et al. [Silva, Luz, Baeta et al. (2015)], deep learning was adopted to detect iris liveness, and it can achieve high detection accuracy. Thereafter,deep representations for iris, face, and fingerprint spoofing detection was proposed in Menotti et al. [Menotti, Chiachia, Pinto et al. (2014)], it can obtain the best known results in eight out of the nine benchmarks. A multi-patch convolution neural network for iris liveness detection was proposed in He et al. [He, Li, Liu et al. (2016)]. It can handle different types of fake iris images because it directly learns the mapping function between raw pixels of the input iris patch and the labels. In Raghavendra et al.[Raghavendra, Raja and Busch (2017)], contlens network was used to classify images with textured contact lens, transparent contact lens, and no contact lens.

Based on the above work, it is believed that data-driven solutions based on CNN might be a valuable direction for iris liveness detection. However, in the traditional networks, a high learning rate may result in the gradients exploding or vanishing, as well as getting stuck in local minima [Ioffe and Szegedy (2015)]. In this paper, to eliminate the phenomenon of overfitting and gradient disappearing during training in convolutional neural network, we proposed a method to detect iris liveness based on batch normalization convolutional neural network, and further improved the detection accuracy.

3 Proposed scheme

The proposed scheme is composed of iris preprocessing and batch normalization in convolutional neural network.

3.1 Iris preprocessing

Iris preprocessing is a fundamental step in the iris recognition systems. It comprises several tasks: finding the iris pupillary and limbic boundaries, localizing the upper and lower eyelids and, not least, excluding regions with shadows, reflections, or occluded by eyelashes [Diego, Giovanni, Carlo et al. (2016)]. This can be a very challenging task,especially for noisy images and non-cooperative acquisitions. Among many approaches proposed in the literature, one of the popular way called iris segmentation, is built on a total-variation based formulation which uses the norm regularization to robustly suppress noisy texture pixels for the accurate iris localization [Zhao and Kumar (2015)]. The process includes image enhancement and noise elimination. The iris structure is extracted by RTV-L1 method, then the iris is positioned by using a circle method. Finally, post process including eyelashes and eyelids is performed on it. In this paper, we follow two iris preprocessing approaches to test the influence of preprocessing methods on iris recognition. One is the iris segmentation. The other preprocessing approach is to normalize the original image into a size of 256×256, and then enhance and denoise it.

3.2 Batch normalization in convolutional neural network

The network structure adopted in the algorithm is Caffe deep learning framework [Jia,Shelhamer, Donahue et al. (2014)]. It is designed for a two-class classification problem to classify the genuine iris image and fake iris image. We name this architecture as batch normalized convolutional neural network (BNCNN). It is composed of three convolutional layers with 3×3 convolution cores, three pooling layers with 2×2 cores,two full connected layers with 1024 neurons, five batch normalized (BN) layers, and five Relu layers. The output of the last full connected layer with 2 neurons is fed to a two-way softmax function. Fig. 1 shows the block diagram of the proposed scheme for iris liveness detection using the batch normalized convolution neural networks.

Figure 1: Structure of batch-normalized convolution neural network

The input of the convolution neural network is obtained from the segmented and normalized iris image with a size of 256×256. The first convolution layer is made up of 128 filterswith 3×3 convolution cores. It filters the input iris image patches, and the convolution operations can be expressed as

where i and j represent the i-th input map and the j-th output map, respectively.represents the convolution kernel between the i-th input map and the j-th output map.Each neuron in the convolution layer represents a feature. The input image is convoluted with 128 trainable convolution kernels, and 128 feature maps are generated on the first convolution layer. The first convolution layer is connected to the BN layer and the feature map is used as the input of the BN layer. Define the input minimum block X of the BN layer as X={x1,x2,…,xm}, and each dimension is normalized to

The distributions of values of anyhas the expected value of 0 and the variance of 1, as long as the elements of each mini-batch are sampled from the same distribution, and if we neglect.

For each activation, a pair of parameters γ and β are introduced. They are learned along with the original model parameters, and are used to restore the representation power of the network [Ioffe and Szegedy (2015)]. They scale and shift the normalized value as (5).

It indicates that the parameters γ and β are to be learned. However, the BN transform does not independently process the activation for each training example. Nevertheless,depends both on the training example and the other examples in the mini-batch.The scaled and shifted values y are transferred to Relu layer. It is proved that Relu could eliminate the gradient issue in the back propagation in training [Glorot, Bordes and Bengio (2011)]. Therefore, Relu is used for faster training with simplicity, and to avoid the gradient issue in CNN. The Relu is defined as follows.

where uiand yiare the corresponding outputs and inputs of the unit, respectively. At this time, an input image of 256×256 is input to the BNCNN network, and it pass through the first convolution layer, BN layer and Relu layer, then 128 feature maps of 254×254 are obtained. Thereafter, the pooling layer uses the maximum pooling method to pool the Relu layer results by a 2×2 kernel feature map.

Max-pooling layer can provide a kind of subsampling. The number of strides is 2, and it means that the max-pooling filter of 2×2 moves two pixels in both the horizontal and vertical directions. The feature map of 254×254 is reduced to 127×127 after pooling, and this feature map passes through two convolution layers, four BN layers, two pooling layers, three Relu layers, and two full connected layers. The output of the final Relu layer is connected to the third full connected layer with 2 neurons. After that, the Softmax function maps the output of these neurons to a second-type probability distribution. The Softmax function is defined as (7).

It can be seen from (7), that if zjis much larger than the others, the mapping component will be close to 1, otherwise it will be close to 0. The loss function of Softmax is defined as

It can measure the difference between the prediction and the actual results. The greater the value of the softmax function is, the smaller the loss function of Softmax is. During the training process, the network adjusts the weight according to the difference. Finally,the softmax function outputs the probability distribution of two class labels, which are used for the classification by minimizing the loss function.

4 Experiment and results

4.1 Data sets and network parameters

To evaluate the classification performance of the proposed method, the proposed algorithm is tested using the classic iris image databases including CASIA Iris Lamp,CASIA Iris Syn, and ND Contact of Dartmouth University. The CASIA Iris Lamp database has irises from 411 volunteers through iris camera (OKI), which contains 16212 genuine iris images. There are 10000 synthetic iris images in CASIA Iris Syn database.The synthetic iris and genuine iris images are very similar in terms of statistical characteristics. Some image samples in CASIA Iris Lamp and CASIA Iris Syn are shown in Fig. 2. ND Contact database contains transparent contact lenses, textured contact lenses and irises without wearing contact lenses. The use of transparent contact will reduce the change of the reflection properties of the iris region. Texture contact lenses exhibit an external texture and color patterns that are printed on the lens.

In the experiment, the transparent contact lens iris and the non-wearing contact lens iris are regarded as the genuine iris, and the textured contact lens iris is regarded as the fake iris. Some iris images in ND Contact database are shown in Fig. 3. We randomly choose partial genuine iris images and the fake iris images from the database, and divided them into two parts to from a training set and a test set. The learning rate of batch convolution neural network is 0.01, and the number of iterations in all experiments is 10000. The quantitative results are measured by using the correct recognition rate and false recognition rate.

Figure 2: Sample iris images in CASIA Iris Lamp and CASIA Iris Syn database: (a)genuine iris image; (b) synthetic iris image

Figure 3: Sample iris images in ND Contact Database: (a) transparent contact lens iris; (b)non-wearing contact lens iris; (c) textured contact lens iris

4.2 Experiments and performance analysis

4.2.1 Experiments on CASIA Iris lamp and CASIA Iris Syn database.

In the experiment, the proposed method and the state-of-the-art works are performed on CASIA Iris Lamp and CASIA Iris Syn Database to present a comprehensive comparison.The comparison results are listed in Tab. 1, where 400 genuine iris images and 400 synthetic iris images were used for training, and 600 iris images were used for testing.BNCNN denotes the batch normalized convolutional neural network with a convolution kernel size of 3. It can be seen that the recognition rate of BNCNN reaches 100% and it outperforms those of Weighted LBP [Zhang, Sun and Tan (2010)], HVC+SPM [Sun,Zhang, Tan et al. (2014)], SpoofNet [Menotti, Chiachia, Pinto et al.(2014)], and Regional features [Hu, Sirlantzis and Howells (2016)], which indicates that the BNCNN can make full use of the automatic learning of deep learning convolution neural network to extract the deep-seated features of the iris. ISBNCNN represents iris segmentation plus batch normalization convolution neural network. The correct recognition rate of BNCNN is slightly higher than that of ISBNCNN. The results show that different preprocessing operations will have influence on the training of the network. Iris normalization can enhance the iris image and reduce noise, and iris segmentation can prevent eyelashes,eyelids and other details from affecting the training of the network. It eventually prevent the effect of over fitting. However, the iris image may lose some texture features during the segmentation processing, that is why the correct recognition rate of the network is slightly lower than that of the network with the iris normalization.

Table1: Recognition rate on CASIA Iris lamp and CASIA Iris Syn database

4.2.2 Experiments on ND contact database

Tab. 2 shows the comparison results of the algorithms based on CNN, where 200 transparent contact lens iris images, 200 non-wearing contact lens iris images and 200 textured contact lens iris images were used for training, and 2400 iris images are used for testing. BN-ContlensNet is batch normalization plus ContlensNet. From Tab. 2, it can be seen that three networks all have high correct recognition rates, which indicates that deep learning convolution neural network can not only extract deep texture features, but also have strong robustness. ISBNCNN, BNCNN and BN-ContlensNet outperform ContlensNet [Raghavendra, Raja and Busch (2017)] and CNN, which demonstrates that batch normalization not only increases the learning rate, prevents over-fitting, but also improves the performance of the network.

Table 2: Recognition rate on ND contact database

?

4.2.3 The influence of the convolution kernel size on iris recognition

Different biotexture features have their best convolution kernel size. Three convolution layers used in this paper were the same size. 3000 iris images were used for training, and 1200 iris images were used for testing. Tab. 3 lists the recognition rate of genuine and fake irises using batch normalized convolutional neural networks with different convolution kernel sizes, and the average test time of a single image. It can be found that the recognition rates are over 99% when the convolution kernel sizes are 3, 5 and 7 respectively. However, the recognition rate is reduced to 93.8% when the convolution kernel size is 9. Moreover, its time consumption increases as the convolution kernel size increases. Taking into account the recognition rate and time consumption of the network,the small convolution kernels are more suitable to extract the fine texture features of the iris.

Table 3: The influence of the size of convolution kernel on iris recognition

4.2.4 Relationship between training data and accuracy

In order to analyze the influence of size of training samples on the recognition rate,BNCNN, ISBNCNN, and ContlensNet were tested with the ND contact database. In the experiment, 100, 150, 200, 250, 300, 350, 400, 450, 500 and 550 iris images were used to conduct convolutional neural network training, respectively. Fig. 4 shows the relationship between the number of training sample and recognition rate. It can see be found that the recognition rate of BNCNN is the highest when using the same number of samples.

Figure 4: Relationship between the number of iris samples for training and accuracy

5 Conclusion

In this paper, an iris liveness detection scheme based on batch standardized convolution neural network is proposed. A batch normalized convolution neural network is constructed to automatically learn the texture features of the iris. With this method, the genuine iris (including the iris of the contact lens) and the fake iris (including synthetic iris and the iris of the contact lens wearing the texture) can be effectively detected.Experimental results and analysis show that the proposed method outperforms the other typical iris detection schemes and is robustness against image preprocessing. It confirms deep learning is an effective means for iris liveness detection.

Our future work will be concentrated on the construction of more effective and personalized convolution neural network for iris liveness detection.

Acknowledgement:This work was supported in part by project supported by National Natural Science Foundation of China (Grant No. 61572182, No. 61370225), project supported by Hunan Provincial Natural Science Foundation of China (Grant No.15JJ2007).