Improving the efficiency and security of passport control processes at airports by using the R-CNN object detection model

face


Introduction
Passport control is a critical process at Airport 1 , which verifies the identity of passengers and ensures that they are authorized to enter or leave the country.However, traditional passport control processes can be slow and prone to human error 2 , which can lead to delays and additional costs for airports.The use of real-time machine learning (ML) 3 to optimize passport control processes can improve airport efficiency and security 4 .With the increase in the number of travellers, airports are increasingly facing challenges to manage passenger flows efficiently and securely 5 .
The passport control process is a set of steps and procedures that verify the identity and validity of passenger's identity documents prior to boarding.It may include the following steps: first, identity document verification where passport control officers verify that the passenger's identity documents are valid and belong to the person presenting the documents.Second, identity verification in which passport control officers can use facial recognition algorithms verifies the identity of passengers by comparing their photo on the ID document with their actual face.Third is data verification, which is used to verify the information entered by passport control officers.Fourthly, the security checks and finally the boarding.Machine getting to know is a subfield of synthetic intelligence that involves developing models and algorithms such can learn from data and perform tasks without being explicitly programmed.There are different types of ML, such as supervised learning 6 unsupervised and reinforced 7 .
The history of ML dates back to the 1950s, when researchers in artificial intelligence began exploring methods for computers to learn autonomously.The first important work in this area was done by researchers like Arthur Samuel 7 and Herbert Simon 8 who developed computer programs that could play checkers and learn from their own mistakes.
During the 1960s and 1970s, significant work was done on supervised learning methods, such as neural networks 9 and regression algorithms 10 .These methods have been used to solve character recognition and computer vision tasks 11,12 .
During the 1980s and 1990s, advances were made in areas such as unsupervised learning, reinforcement learning, and online learning systems.These advances have led to the development of more efficient and flexible machine learning systems.
With the advent of DL and RL, ML has boomed in recent years, solving complex problems in speech recognition, vision, machine translation, and video games.Today, machine learning is used in many fields, including finance, healthcare, robotics, social networks, autonomous vehicles, and scientific research.
The objective of this study is to demonstrate the potential of the Region-based Convolutional Neural Network (R-CNN) approach in enhancing the efficiency and security of passport control procedures at airports.The investigation will focus on addressing common issues faced during passport control, including lengthy queues, errors in data entry, and manual authentication of identification documents.The advantages of utilizing real-time machine learning to tackle these problems will also be highlighted, such as reducing human errors, increasing accuracy, and enhancing the overall efficiency of the process.Furthermore, the study will present the outcomes of a specific case study and evaluate the feasibility of implementing R-CNN in airports in real time.
In the rest of this paper, in section 2, the current state of passport control technology at airports and the common problems encountered will be explored.Section 3 presents an innovative approach based on the R-CNN method to solve these problems.Section 4 presents a case study on the use of the new approach to optimize passport control processes at a specific airport.In section 5, the results obtained and the analysis of the improvements brought by the use of R-CNN will be presented, before concluding in the last section.

The problem of this study
The current state of passport control technology at airports is primarily based on manual processes 13 .Passport control officers manually verify passenger identity documents, comparing document information with data in the airport's computer system.However, this method can be slow and prone to human error, as agents may have difficulty reading the document information or comparing it correctly with the computer data.
One of the common problems encountered in passport control processes is long queues.When passport control processes are slow, passengers may be forced to wait for long periods of time before they can pass through the control.This can cause significant delays for passengers and flights, and can also result in additional costs for airports.
Data entry errors are also a common problem in passport control processes.Passport control officers can make errors when entering passenger identification document information into the airport's computer system.These errors can result in additional delays for passengers and additional costs for airports.
Manual verification of identity documents is also a common problem.Passport control officers may have difficulty verifying the validity of passengers' identity documents, which can lead to additional errors and delays for passengers and airports.
Traditional passport control processes are often slow, prone to human error, and can cause delays and additional costs for airports and passengers.It is therefore important to find solutions to improve these processes to ensure a smooth and secure travel experience for all.Some of the recent methods used for passport control at airports include the use of passport scanners, facial recognition cameras, biometric technologies such as iris and fingerprint recognition 14,  based identity control systems 15 .These methods allow for quick and accurate identification of travellers, thus improving the efficiency and security of passport control processes.Facial recognition technologies are particularly useful because they can identify travellers without removing their masks, which is especially important in the context of the COVID-19 pandemic.

Machine learning and existing methods for object detection in images
Real-time ML 16 is a technique for implementing machine learning models to perform tasks in real time, i.e., by responding to prompts in a fast and efficient manner.This can be accomplished by using ML algorithms that can run quickly on data being acquired, or by using real-time architectures to process data continuously 17 .
There are several use cases for real-time ML, such as speech recognition, facial recognition, motion detection, health monitoring, industrial process control, and autonomous vehicle driving.
There are challenges to implementing real-time machine learning systems, such as the need to process a large amount of data in real time, the need to handle inconsistencies and time errors in the input data, and the need to ensure data reliability and security.
There are several techniques for implementing real-time machine learning systems, such as using distributed machine learning models the use of edge computing AI architectures the use of parallel processing neural networks and the use of lowlatency machine learning models 18 .
Object detection in images is an active research area.There are several methods for object detection in images, each with its advantages and disadvantages.Traditional methods such as local descriptors like HOGs and texture features 19 , as well as support vector machines (SVMs) 20 , have been widely used for object detection.However, these methods have limited performance for object detection under varying lighting conditions, different object orientations or occlusions.CNNs 21 have been an important breakthrough in object detection in computer vision, as they can automatically learn the relevant features of objects in images, making them more efficient than traditional methods.CNNs were then extended to approaches such as Mask R-CNN, YOLOv5, EfficientDet 22 and SSD 23 for real-time object detection, which have even better performance.Vehicular Ad Hoc Network (VANET) is a subset of the broader Internet of Things (IoT) system and is considered a key area of research in intelligent transport technologies.It is also seen as a key technology for future autonomous cars, as vehicles need to be able to communicate with each other to operate autonomously and safely.VANET has the potential to improve road safety, traffic management, driver navigation and reduce greenhouse gas emissions by enabling vehicles to make more informed and efficient decisions.The use of VANET systems [24][25][26][27] can potentially contribute to improving the efficiency and security of passport control processes at airports by allowing passengers to communicate in real time with airport authorities.

IATA airport quality standards.
The International Air Transport Association (IATA) Airport Quality Standards 28 are a set of criteria and standards that define the quality of services offered by airports worldwide, as presented in Tables 1, 2. They are designed to ensure a safe, efficient and comfortable travel experience for passengers, as well as sustainable and environmentally friendly management.IATA standards are regularly updated to reflect the latest trends and developments in the airline industry, and are recognised as an international benchmark for airport service quality.Airports can be assessed and ranked according to their level of compliance with IATA quality standards, which cover aspects such as security, efficiency, passenger comfort, cleanliness, environmental management and more.IATA airport quality standards also cover the passport control area.In this area, the standards focus mainly on the quality and efficiency of the passport control processes for passengers.IATA standards require airports to provide adequate facilities for passport control, such as queues long enough to accommodate large numbers of passengers, modern and efficient screening equipment, and trained and qualified staff to process passengers' travel documents as shown in Table 3 below.In addition, IATA standards also encourage airports to implement innovative solutions to improve the efficiency and security of passport control processes, such as the use of advanced technologies like facial recognition systems or information technology to improve queue management and reduce passenger waiting times.By meeting these IATA airport quality standards in the passport control area, airports can improve the passenger experience and enhance their reputation for providing superior service while ensuring the safety and security of travellers.

Presentation of the R-CNN based approach to solve passport control problems at airports
The real-time machine learning approach to solving passport control problems at airports is to use AI algorithms to automate and optimize passport control processes.ML algorithms can be used to automate the verification of passenger identity document information.In this paper, R-CNN 17 will be utilized for character recognition in order to automatically read ID document information and cross-check it with data stored in the airport's computer system.

Convolutional neural network
CNN as in 29 are a type of neural network used for image recognition, character recognition, speech recognition, and other computer vision related tasks.CNNs are widely used in facial recognition due to their ability to capture complex features in images.
CNNs 30 are based on layers of neurons arranged in a pyramid-like architecture, where the upper layers are connected to the lower layers.Each layer consists of several filters that are applied to subparts of the input image.The filters are used to extract features such as contours, textures and shapes in the image an example shown in Fig 1.    .Some architectures are shallower, while others are deeper and have more layers.In addition, specific architecture can have a significant impact on network performance, so it is important to experiment with different architectures and choose the one that works best for a particular task.
A convolutional layer is a type of layer commonly used in CNNs for image recognition tasks as presented in Fig 4 .The main objective of this CL is to extract features from the input image by applying a set of filters to small regions of the input image.The operation in a convolutional layer is called convolution 32 , a mathematical operation that applies a filter (also called a kernel or weight matrix) to a small region of the input image called the receptive field.The filter is moved across the input image, with each position producing a new feature map.
The number of filters used in a convolutional layer is a hyperparameter that can be adjusted.Each filter is learned during the training process, and each filter is responsible for detecting a specific feature.For example, some filters can detect edges, while others can detect textures.
A convolutional layer may also include a bias term 33 , which is added to each feature map element after convolution.After the convolution operation, the output is passed through an activation function, such as Re LU, to introduce nonlinearity into the model.The output of the convolutional layer is a set It is important to note that the filter size and step size used during the convolutional operation can also be adjusted as a hyper parameter, which can affect the size of the feature maps and the number of parameters in the model.The spatial dimensionality of the output of the convolutional layers will be altered.The calculation for this can be determined by utilizing the following formula: A pooling layer is a type of layer generally used for image recognition tasks CNN 34 .The main desire of pooling layers is to diminish the size of feature maps composed by convolutional layers, making the character representation extra robust to small transformations of the input image.The most common genre of pooling is max pooling, which selects the maximum value of a small region of the feature map.Other types of pooling are average pooling, which computes the average value of the region, and L2-norm pooling, which computes the square root of the sum of squares of the region values.A pooling layer takes the feature map from the previous layer and applies the pooling operation to a small region of the feature map.Pooling operations are typically performed using a fixed size window such as: B. Apply 2 x 2 or 3 x 3 and steps.
Step controls the step size of the window when sliding over the feature map.Grouping operations reduce the spatial dimensionality of feature maps, reducing the number of parameters in the model and the computational cost.It also makes the model more robust to small transformations of the input image.
It is important to note that the amount of the pooling window and the steps used during the pooling process are also tuneable as hyperparameters 35 .This can affect the size of feature maps and the total of parameters in the model.Additionally, the pooling operation can be tested multiple times to extract more abstract features and make the model more robust to translations and rotations.
The main purpose of an absolutely linked layer is to combine features extracted from previous convolutional and pooling layers to classify an input image or generate an output.An entirely linked layer consists of a set of neurons, each neuron linked to every neuron in the previous layer.This means that each neuron in a fully linked layer takes all elements of the previous layer as input.Each of these neurons has weights that are learned during movement.The output of an absolutely linked layer is generated by multiplying the input by each neuron's weight and then performing a weighted sum on each neuron.Activation functions such as Re LU and sigmoid are commonly used to introduce nonlinearity into the model.The features of each proposed region are extracted using the same convolution filters, pooling layers and normalization layers used for the whole image. Classification: The extracted countenance is used to arrange the proposed regions into different object categories using SVM classifiers or neural networks. Prediction: The proposed regions are then labelled and classified using the classification results. Weight adjustment: Adjust the neural network weights using backpropagation to minimize the prediction error.The pre-processed dataset is then divided into training and test sets.An R-CNN model is trained on the pre-processed training dataset for passport object detection, and the accuracy of the trained model is evaluated on the test dataset.If the accuracy is not satisfactory, the model is refined with new data.When the accuracy of the model is satisfactory, it is integrated into the passport control system at airports.Real-time data from passport control cameras are collected and the R-CNN model is applied in real-time for the detection of passport objects in the collected images.

Region-based Convolutional Neural Network
Real-time detection results are displayed for verification by the passport control officer, and realtime performance data is collected to assess the accuracy and speed of detection.If the performance is not satisfactory, the model is refined with new data to improve the detection performance.The process of collecting real-time data, applying the R-CNN model and evaluating the performance is repeated until the accuracy and speed of detection are satisfactory.The result of the algorithm is an R-CNN model trained for object detection in passports, integrated into the passport control system at airports.between TDR and precision.It is calculated using the formulas for accuracy and TDR. It is important to note that these metrics may vary depending on the specific object detection task and the database used for training and testing.It may be necessary to use multiple metrics to fully assess model execution.

Case study: using the R-CNN based approach to optimize passport control processes at a specific airport
In the case study, a novel R-CNN model architecture will be utilized for character recognition, with the MNIST database being used.The implementation of this R-CNN model will be carried out using the Python programming language due to its popularity, simple syntax, active community, and availability of numerous tools and libraries for machine learning.The MNIST database will be used for training and evaluating the model's performance, and optimization techniques will be employed to enhance its overall performance.The system was developed using an HP laptop with the following specifications: processor -Intel(R) Core(TM) i7-4600M CPU with Clock Speed: 2.90 GHz, Memory Capacity (RAM) -16.0 GB, and Hard Drive Capacity -500 GB.Various libraries such as Tensorflow, Keras, Numpy were chosen for the development of the system.
The MINIST database.
MNIST (Modified National Institute of Standards and Technology database) 36,37 is a popular database that contains images of digitized handwritten figures.It was created in 1998 by Yann Lecun, Corinna Cortes and Christopher Burges to train character recognition algorithms.It has become a reference for handwritten digit recognition algorithms.The MNIST database contains 60,000 images for training and 10,000 images for testing.Each of the images has a resolution of 28x28 pixels and is in black and white.The images were digitized from actual handwritten digits, making the MNIST database very close to reality.It is widely used to evaluate the performance of character recognition algorithms, that's why it is considered a reference for handwritten digit recognition algorithms.

Results and Discussion
Results and analysis: presentation of the results obtained and analysis of the improvements brought about by the use of the R-CNN based approach in real time.
Four distinct machine learning models have been implemented, R-CNN 38 , MLP 39 CNN and RNN 40    The different approaches obtained very good accuracies 94 % for R-CNN, 77 % for MLP, 78 % for CNN and 90 % for RNN, as presented in Table 4 and Fig 7 above.
The comparison between R-CNN and MLP in terms of precision, recall and F1 -score is presented in the following Table 5 and Fig 8: Accuracy measures the percentage of correct predictions over the total number of predictions.R-CNN is more accurate than MLP because it can capture spatial and temporal features of images.The ratio of true positives to the entire number of correct positives and false negatives for recall was also measured.Also, R-CNN has a higher recognition value than MLP.Finally, the F1-score is a weighted ordinary of precision and recall.Both precision and recall are considered to evaluate model performance.Also, R-CNN has a higher F1-score than MLP.
In sum, an R-CNN always outperforms an MLP for handwritten digit recognition using the MNIST database because of its ability to capture spatial and temporal features in images.Indeed, R-CNN is designed to capture specific features in images.
Similarly, R-CNN has very high values in terms of precision, recall and F1-score compared to the other two models RNN and CNN, as it is able to select the most relevant features for character recognition.
When it comes to data security, there are several aspects to consider.First, the collection of sensitive personal data such as passport images can pose a risk of a privacy breach.Therefore, it is important to implement security measures to protect the data collected, such as encrypting the data and implementing strict security policies.In addition, the integration of the R-CNN model into the passport control system may also present security risks, such as security vulnerabilities in the software or man-inthe-middle attacks to intercept data in transit.In terms of physical security, it is important to ensure that passport control cameras are well protected against vandalism or sabotage attempts.In addition, it is important to have security protocols in place to ensure that only authorized individuals have access to the images captured by passport control cameras.Additionally, a comparison was made between our approach and a second method, "Passport Object Detection using YOLOv3".This approach uses another object detection algorithm called YOLOv3 to detect passports in images.YOLOv3 uses a deep convolutional neural network architecture that is designed to be faster and more accurate than other object detection approaches.Experimental results showed that this approach is faster than the Passport Object Detection using R-CNN approach, but it is less accurate for small object detection.
Both approaches have advantages and disadvantages in terms of object detection speed and accuracy."Passport Object Detection using YOLOv3" is faster than the "Passport Object Detection using R-CNN" approach, while the latter is more accurate than the YOLOv3-based one.The choice of which object detection method to use therefore depends on the specific requirements of the passport control system.
In sum, R-CNN performs better than other models for character recognition because it is able to select the most relevant features for character recognition.However, it is important to note that this will depend on the data used to train and evaluate the models.Therefore, it is important to test them on various data to get accurate results.

Conclusion
The use of real-time (R-CNN) to optimize passport control processes at airports can significantly improve the efficiency and security of passport control processes.AI algorithms such as character recognition, facial recognition, predictive algorithms and automatic data processing can be used to automate and optimize passport control processes.The results show significant reductions in identification errors, delays and additional costs.In terms of the outlook for implementing real-time machine learning at airports, it is important to note that this is an evolving field and that new AI algorithms and technologies will continue to be developed to further improve passport control processes.Airports should continue to explore these new technologies to improve the travel experience and security of passengers.

Figure 1 .
Figure 1.Typical structure of an RBF neural network.CNNs also use pooling layers which lessen the dimensionality of the data by pooling information from multiple neurons in the previous layer.This allows us to capture more global features in the image as in Fig 2.

Figure 2 .
Figure 2. Pooling and maximum pooling for CNNs.CNNs are trained on training data to learn to recognize objects or characters in images.Once trained, the model can be used to classify new images or to extract features in images.The overall architecture of CNN30 as in Fig.3generally consists of several layers, including(1)  Input layer: This is network's first layer, where the input image is fed into the network.(2) Convolutional layers (CLs): These layers are responsible for extracting features from the input image.They usually consist of a set of filters, which are applied to small regions of the input image to produce feature maps.These characteristic maps are then used as entries for the following layer.(3) Pooling layers: These layers are used to reduce the dimensionality of the feature maps produced by the CLs.They typically involve applying a pooling operation, such as maximum pooling, to groups of adjacent neurons in the feature maps.(4) Fully connected layers: These layers are used to combine the features extracted by the convolutional and pooling layers to classify the input image.They usually consist of a set of neurons, each of which is connected to all neurons in the previous layer.(5)The output layer: This is the final layer of the network, where the output of the network is produced.The output can be a probability of each class, or some other form of output depending on the task.

Figure 3 .
Figure 3.A simple CNN architecture, composed of only five layers It is important to note that the architecture of a CNN can vary depending on the task and the data set

Figure 4 .
Figure 4. Visual representation of a convolutional layer.

Figure 5 .
Figure 5.The Region-based convolutional neural network R-CNN stages

Figure 6 .
Figure 6.Class Diagram for Passport Detection System and Related Classes.Algorithm 1. below describes the process of detecting passport objects using R-CNN and integrating the model into the passport control system at airports.The first step is to collect a large set of passport images from different sources and formats.The images are pre-processed by resizing, normalising and converting them into a format compatible with the R-CNN.
There are some metrics you can use to assess the performance of a Faster R -CNN model, including:  Global Average Accuracy (m AP) : This metric measures the average precision of the model for all object classes.It is calculated by taking the average of the precision -recall curve for each class. Average Accuracy (AA): This metric measures the accuracy of the model for different recall values. False Positive Rate (FPR): This metric measures the number of times the model detected an object where none was actually present. Detection Rate ( DR): This metric measures the percentage of real objects that were detected by the model. F-measure: This metric is a compromise

Algorithm 1 .
Passport Object Detection using R-CNN Require: A dataset of passport images Ensure: R-CNN model trained for passport object detection, integrated into the passport control system at airports 2024, 21(2): 0523-0536 https://doi.org/10.21123/bsj.2023.8546P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal Data pre-processing.Data pre-processing is one of the key steps in implementing an R-CNN model for character recognition.The data in the MNIST database must be pre-processed to fit the inputs of the R-CNN model.Common data pre-processing steps include:  Image Resizing: The MNIST database images have a resolution of 28x28 pixels, but may need to be resized to fit the R-CNN model entries. Data normalization: Data is normalized to be within a certain range of values.This caninclude converting images to grayscale and normalizing pixel values to be between 0 and 1 . Converting images to an R-CNN compatible format: Images must be converted to an R-CNN compatible format such as a numpy array or tensor to be used for training. Separation of training and test data: The data is separated into two sets, a training set and a test set.The training set is used to train the model, while the test set is used to evaluate the performance of the model. In sum, data preprocessing is a crucial step to fit the MNIST database data for the R-CNN model inputs and for training and evaluating the model.Model construction and fitting Model building and fitting are important steps in implementing an R-CNN model for character recognition.Common steps to build and fit the model include:  Definition of the model structure: The structure of the R-CNN model is defined using Python libraries for machine learning such as Tensor Flow or PyTorch.It is important to define the convolution, pooling, normalization and classification layers for the model. Weight initialization: The model weights are initialized before training.There are several common techniques for initializing weights, such as random initialization or initialization with pre-calculated values. Model compilation: The model is compiled by defining the training parameters such as the cost function, the optimizer and the evaluation metrics. Model training: The model is trained using data from the MNIST database.The model weights are adjusted to minimize the prediction error. Save model: The trained model is saved so that it can be used for real-time character recognition.
. These models were trained and tested on 4 different subsets of the MNIST Dataset.Several tests were done in order to obtain the right Hyperparameters for each model.These parameters cannot be adjusted during the training phase, yet they have a great impact on the performance of the models during training.They include variables that determine the structure of the network (No. of neurons, No. of layers, activation function,), the batch size and the number of iterations ...etc.The experimentation was done on the MNIST subset.The training data was divided into 2: 80% for learning and 20% for evaluation.The training does not take much time; the models were trained from 15 to 20 epochs.For validation, the R-CNN model was compared to the traditional neural network model (MLP), which is a simple model formed of fully connected layers.Then, it was compared to CNN and finally to the recurrent neural network model (RNN).Precision, recall, and F-score metrics were used for this comparison.

Figure 7 .
Figure 7.The representation of the results obtained by our R-CNN model with the other models.

Figure 8 .
Figure 8.The representation of the results obtained by our R-CNN model as a function of the number of iterations.
as well as artificial intelligence (AI)

Table 5 . Results obtained by our R-CNN model as a function of the number of iterations.
-Conflicts of Interest: None.-We hereby confirm that all the Figures and Tables in the manuscript are ours.Furthermore, any Figures and images, that are not ours, have been included with the necessary permission for republication, which is attached to the manuscript.-Ethical Clearance: The project was approved by the local ethical committee in Sultan Moulay Slimane University, Beni Mellal, Morocco.