Pose Invariant Palm Vein Identification System using Convolutional Neural Network

: Palm vein recognition is a one of the most efficient biometric technologies, each individual can be identified through its veins unique characteristics, palm vein acquisition techniques is either contact based or contactless based, as the individual's hand contact or not the peg of the palm imaging device, the needs a contactless palm vein system in modern applications rise tow problems, the pose variations (rotation, scaling and translation transformations) since the imaging device cannot aligned correctly with the surface of the palm, and a delay of matching process especially for large systems, trying to solve these problems. This paper proposed a pose invariant identification system for contactless palm vein which include three main steps, at first data augmentation is done by making multiple copies of the input image then perform out-of-plane rotation on them around all the X,Y and Z axes. Then a new fast extract Region of Interest (ROI) algorithm is proposed for cropping palm region. Finally, features are extracted and classified by specific structure of Convolutional Neural Network (CNN). The system is tested on two public multispectral palm vein databases (PolyU and CASIA); furthermore, synthetic datasets are derived from these mentioned databases, to simulate the hand out-of-plane rotation in random angels within range from -20° to +20° degrees. To study several situations of pose invariant, twelve experiments are performed on all datasets, highest accuracy achieved is 99.73% ∓ 0.27 on PolyU datasets and 98 % ∓ 1 on CASIA datasets, with very fast identification process, about 0.01 second for identifying an individual, which proves system efficiency in contactless palm vein problems.


Introduction:
The traditional personal identification or verification systems like (passwords, ID cards, etc.) have become inefficient and unable to meet the needs of current society, since it can be stolen or lost.For these reasons the biometrics identification systems becoming the focus of the research in recent years.
Palm vein is a one of the most interesting type of biometric technologies, it is like a palmprint, but rather than using visible light spectrum for capturing palmprint, palm vein needs Near Infra-Red (NIR) illumination for capture the vein pattern that hidden under palm skin.Each person has unique characteristics of the veins that can be used for identification or verification.Comparing with the other biometric techniques, palm vein is (1) (2):  less costly  Easy acquiring Department of Computer Science, University of Technology, Baghdad, Iraq * Corresponding author: 0111545@student.uotechnology.edu.iqDifficult to fake or change because it's unique pattern inside the human's body.
There are two types of palm vein acquisition techniques: contact based and contactless based, the contactless is more convenient for individuals, but without the peg guide the individual hand in contactless acquisition method, there will be pose variations (rotation, scaling and translation transformation) in palm vein image affected by movement of the hand.
There are several researches involved palm vein and contactless palmprint, Methani C, Namboodiri A. in 2009 (3) proposed a method involved with variations in pose (out-of-plane rotations, translation) in contactless and unconstrained palmprint imaging, a similarity measure between the reference image and corrected test image is computed.Image alignment technique is proposed to distinguish the orientation of the palm.Experiments are applied on a real and a synthetic dataset, Mirmohamadsadeghi L, Drygajlo A. in 2014 (4) presented a new approach depends on local texture patterns for palm vein patterns using histograms and operators of multi-scale Local Binary Patterns (LBPs) , then higher-order local pattern descriptors is investigated by using histogram of Local Derivative Pattern (LDP), Abbas M, George E. in 2014 (5) suggested a palm vein recognition system using spatial energy distribution of wavelet sub-bands, Discrete Haar wavelet (DHW) is applied, then the average energy distribution for each sub-band is computed, feature vector is gained by concatenated these sub-bands, Wang R, et al. in in 2014 (6) presented palm vein identification method depending on Gabor wavelet, in the beginning, contrast limited adaptive histogram equalization (CLAHE) used for enhances the contrast and image skeletonization for vein thinning, then Gabor wavelet transform-based method used for feature extraction.Jalali A, Lee M. In 2015 (7) proposed a contactless palmprint biometric system using features of palm texture from an image acquired by ordinary digital camera.Meanwhile convolutional neural network (CNN) is applied for palmprint recognition.Dian L, Dongmei S. in 2016 (8) a palmprint recognition approach using CNN are proposed, at first improved fuzzy enhancement algorithm is applied for preprocessing; then the features are extracted using the AlexNet (a specific structure of CNN), finally, the features are matched using the hausdorff distance.Ali H, Razuqi N. in 2017 (2) proposed a Palm Vein Recognition System which is based on Centerline.The proposed system extracted centerline of vein depending on distance form boundary (DFB), the features are set of key points which generated based on the Difference-of Gaussian (DoG), the absolute difference between two features is used for matching, the system is tested with PolyU multi-spectral palmprint database.
This paper proposed a new pose invariant contactless palm vein identification system, intended to identify palm vein images captured with pose variations in 3D space which cause an out-ofplane rotation , scale and translation variations, at first data augmentation is done by making multiple copies of the input image then perform on them outof-plane rotation around all the X,Y and Z axes; this step would reduce the samples of palm needed to be captured from an individual, then a proposed fast extract region of interest ROI algorithm is applied on palm vein images for cropping palm region, histogram equalization used to enhance appearance of images, features are extracted and classified by special structure of CNN.To evaluate the proposed system, two public palm vein databases (PolyU and CASIA) are used, each one is divided into training and testing portions; furthermore, for simulating hand out-of-plan rotation, synthetic datasets are derived from the testing portion.These testing datasets are used to measure the validation accuracy of the proposed system.

Convolutional Neural Network (CNN)
CNN is a class of deep, feed-forward artificial neural networks, the most important characteristics of CNN is the local connectivity and the using of shared weights; thus, it can learn local features of the input image (8).In image classification, the CNN is very effective and fast method in recognition phase, even in partly rotation, distortion, scaling and translation variations, but it is slow in training phase, so training some thousands of images could take several hours.Back propagation is used for training the CNN by adjusting the weights of filters (kernels).The weights are updated using Eq. ( 1): Where w jk is the weights of the filters, ƞ is the learning rate equal to 0.0001, and Δw jk is the derivative of error regarded to weights (7).The typical CNN structure consists of the following layers ( 9):  Convolutional Layers: takes several feature maps as input and produce n of feature maps as output, where n is the number of learnable filters in the convolution layer.Filter weights are adapted using back propagation depending on training data.The number n of filters as well as the filter's size k w × k h are hyper parameters of convolutional layers.The outputs of this layer go through Rectified Linear Unit (ReLU) that is the common choices for non-linear activation function.


Pooling Layers: Pooling summarizes a p × p area of the input feature map.Pooling can be used with a stride of s ∈ N≥ 2, pooling layers are sometimes also called subsampling layers is applied for three reasons: To get local translational invariance, to get invariance against minor local changes and, most important, for data reduction to (1/s 2 )th of the data. Fully-Connected Layer: Fully Connected layer is a traditional multi-layer perceptron where every neuron in a layer is connected to each neuron on the following layer.The work of the fully connected layer is to classify the features extracted by previews (convolutional and pooling layers) to the proper classes.Dropout technique can be applied over this layer, it is used to prevent over fitting by setting the output of any neuron to zero with probability p.
The Problem of Contactless Palm Vein Some of the existing palm vein recognition system employed a special peg to keep the hand of the user stable, to take well aligned palm images, for modern biometric applications, such as mobile devices the pegs can't be used, so the palm images will be captured with variations in 3D space.The position, direction, out-of-plane rotation and degree of stretching may affect the image accuracy.Thus, it is essential for any system to be robust to the translation, position, direction, stretching, and rotation variations (3) (7).Fig. 1 shows a multiple spectral imaging device without peg guide where the hand is free in 3D space (10).
Other problem in contactless system is matching speed, especially on the systems that use traditional methods for matching although it has hundreds of enrolled individuals, its maybe not convenient to user still raises his hand waiting for identifying process, so the system should be fast enough.

Datasets
In this paper, real and synthetic datasets is derived from two well-known multi-spectral palm veins databases, these datasets are categorized as the following:  Real datasets 1.
CASIA dataset (contactless): consist of 600 images derived from the CASIA multi-spectral palmprint image database V1.0 (10), it is composed of palm vein images of 100 individuals (six images for each individual), captured using a multiple spectral imaging device which it without guide pegs so there is a certain degree of variations of hand pose.The 600 images that used here comprise all samples from all individual's right hands captured under 940 nm near-infrared (NIR) illuminations, some of its samples shown in Fig. 2.


Synthetic Dataset for Testing Phase Synthetic datasets are generated for simulating an out-of-plane rotation (which represent hands rotates freely and randomly in 3D space), the palm images in testing datasets which derived from CASIA and PolyU databases are rotated around all, X and Y axes (Z in plane axis), by a random angle for each axis in the range (-20° to +20°), Fig. 4 shows samples from the synthetic datasets notated with its rotation angels which chosen randomly.

The Proposed System
In this section, the proposed pose invariant contactless palm vein system is presented, it included two major phases; enrollment phase   Augmentation, (a): original image, (b), (c)  and (d) images rotated with (+20°) angle around each X, Y and Z axes, (e), (f) and (g) images rotated with (-20°) angle

 Extract Region of Interest (ROI) Algorithm
In this section, a new fast extract ROI algorithm is proposed to detect and crop the most interesting center region of the palm.The proposed algorithm as shown in Algorithm 1, is described in the following steps: 1.The input gray scale NIR palm vein image is captured from the imaging device or databases.This method is very fast, it takes about (0.007) second for one image, it's also invariant to translation, and scale variations, but not for rotation, since the rotation variations are solved next by CNN.Fig. 6 displays the output images of each proposed ROI steps.

Histogram Equalization
Normalizes the brightness and increase the contrast of ROI palm image using a traditional Histogram Equalization method.The aim of this stage is to speed up the training phase of CNN, since it was observed that this step reducing the number of epochs needed to get the same results.

Feature Extraction and Classification using CNN
In this section, a special structure of convolutional neural network CNN is presented as shown in Fig. 7.The CNN combines feature extraction and classification steps in a single trainable module are described in the following: 1.

2.
Classification, consists of two fully connected layers, the first one size is 1024 neuron and the second one size is equal to the number of enrolled individuals.The output of the last fully connected layer input to a SoftMax activation function that assigns a probability for each individual class for matching.Table 1 explains the layers of the proposed CNN structure, the second column in the table contains layer type followed by its activation function (layer 1 to layer 7), while layer 8 and 9 are followed by dropout technique with 0.5 rate, third and fourth columns describe the size of feature map matrix, and the last column shows the kernel (filter) size for each layer with stride.After several training epochs a trained CNN will be ready for identifying any test sample.Results and Discussion: The system is implemented using Python programing language and TensorFlow open-source framework.The classification accuracy metric (i.e., the proportion of correct predictions) is used for evaluating the proposed system.The considered training:testing ratio is 3:1 for PolyU dataset, and 5:1 for CASIA dataset.Six experiments are applied on each PolyU and CASIA datasets, the results are presented in Tables 2 and 3.The first column in these tables describes the training approach of the system used for each experiment notate if there is data augmentation with specific angel is applied on the input image; the second column notates which dataset categorize are used for testing, last column resulted the mean classification accuracy for validation process ∓ standard error.
Fig. 8 shows validation accuracy plot which displays the increasing of accuracy during training epochs, of experiment 2 and experiment 6, for both CASIA and PolyU datasets.Experimental results demonstrate that the proposed system is very effective for palm vein identification.
The first experiment presented in Table 1 and 2, shows the accuracy resulted when the proposed system ordinary implemented without any data augmentation and synthetic contactless simulation.
The accuracy in the second experiment achieved best result; this proves that the synthetically augmentation of input palm image with new variant poses would highly improve the accuracy of system.In the sixth experiment, acceptable accuracy is achieved, this indicates that the system handling a sensible pose variation of the tested palms that could be rotated in some angles in all directions.
In general, palm vein recognition was addressed in (2)(4)(5)(6) papers, when traditional methods are used and without simulation for contactless palm.To our knowledge, this is the first time CNN method are used with contactless palm vein, and this is the first time a synthetically simulation for contactless palm vein are performed.
Despite the results of the above mentioned papers may be equal or slightly higher than the result of the proposed system , but the obtained accuracy is very good even it has been experimented on a synthetic dataset that simulates a high degree of pose variation, also the using of proposed fast ROI algorithm with CNN result a very fast identification process , since the overall time needed for identifying one palm from the input to ROI step until the output is about (0.01) second , this time is fixed for all experiment since the proposed structure of CNN is fixed.

Figure 1 .
Figure 1.multiple spectral imaging device without peg guide and show hands with different poses (10)
, & identification phase.Enrollment phase is the training phase: in which, all training palm vein images of all individuals are simultaneously pass in the system for training the CNN.Training phase may take about an hour to produce a trained CNN.Identification phase is the testing phase; in which, the trained system can directly identify any palm vein of enrolled individuals.The proposed system is composed of the following steps: Data Augmentation, Extract (ROI) (using a proposed algorithm), Histogram equalization, Feature extraction and Classification (using a proposed CNN Structure)  Data Augmentation To reduce the number of sample images acquired from the individual and for simulating the hand out-of-plane rotation around all axes.The system preforms data augmentation step on the input image (in training phase only) by automatically generating 7 different poses copies of it, one image without any rotation, three images rotated with (+20°) angle around X, Y and Z axes, and the last three images rotated (-20°) around same axes, this augmentation is shown in Fig. 5.

2 . 3 . 5 . 6 . 7 .
Convert the input image to binary using simple thresholding parameter Apply Morphological open operation with a small filter size to remove the noise 4. Apply Morphological open operation with a big size filter commensurate with palm region size, which is estimated by computing the arithmetic mean of the binary image, this operation extracted region regardless of the zooming or translation.Compute upper left point and lowest right point, from contours points of palm region.Crop the palm region rectangle from the input gray level image.Resize the cropped image to (64 pixel * 64 pixel).

Figure 6 .
Figure 6.The output images of each ROI steps, each image corresponding to one of the algorithm steps

Figure 7 .
Figure 7.The proposed structure of CNN with size of feature maps and kernels

Figure 8 .
Figure 8. Validation accuracy plot, display the increasing of accuracy during the training time, comparing between experiment 2 and experiment 6, (a) tensor-board of CASIA dataset, (b) tensor-board of PolyU dataset