Constructing a Software Tool for Detecting Face Mask-wearing by Machine Learning

: In the pandemic era of COVID19, software engineering and artificial intelligence tools played a major role in monitoring, managing, and predicting the spread of the virus. According to reports released by the World Health Organization, all attempts to prevent any form of infection are highly recommended among people. One side of avoiding infection is requiring people to wear face masks. The problem is that some people do not incline to wear a face mask, and guiding them manually by police is not easy especially in a large or public area to avoid this infection. The purpose of this paper is to construct a software tool called Face Mask Detection (FMD) to detect any face that does not wear a mask in a specific public area by using CCTV (closed-circuit television). The problem also occurs in case the software tool is inaccurate. The technique of this notion is to use large data of face images, some faces are wearing masks, and others are not wearing masks. The methodology is by using machine learning, which is characterized by a HOG (histogram orientation gradient) for extraction of features, then an SVM(support vector machine) for classification, as it can contribute to the literature and enhance mask detection accuracy. Several public datasets for masked and unmasked face images have been used in the experiments. The findings for accuracy are as follows: 97.00%, 100.0%, 97.50%, 95.0% for RWMFD (Real-world Masked Face Dataset)& GENK14k, SMFDB (Simulated Masked Face Recognition Dataset), MFRD (Masked Face Recognition Dataset), and MAFA (MAsked FAces)& GENK14k for databases, respectively. The results are promising as a comparison of this work has been made with the state-of-the-art. The workstation of this research used a webcam programmed by Matlab for real-time testing.


Introduction:
As reported publicly, an outbreak of deadly pneumonia occurred in Wuhan City, Hubei Province, China, in December 2019. This form of pneumonia is called SARS-CoV-2 or Coronavirus 1 . Then, the World Health Organization (WHO) names it COVID-19 2 . Since up to now there is no exact cure drug or no vaccine for COVID-19, medical professionals have advised that people avoid any potential infection through a variety of means and methods, such as avoiding travel to highrisk areas, no contact with symptomatic individuals, cleaning all around us, including regular hand washing and use of face masks to prevent the taking of droplets 2 .A face mask is useful for both the prevention of asymptomatic disease and the transmission of disease in healthy people. In other words, the use of face masks by a healthy population in the community has a high percentage of reduction in the risk of transmission of respiratory viruses. Besides, facial masks are considered to be a form of personal protective equipment to prevent the spread of respiratory infections and to be effective in preventing the transmission of respiratory viruses and bacteria 3 .When mask-wearing is considered to be serious, it can contribute to the control of COVID-19 by reducing the emission of infected droplets from individuals 4 .To demonstrate the impact of maskwearing impact as seen in this study 5 , it has been explained that "very weak masks (20 % effective) can still be useful if the transmission rate is relatively low or decreasing." This study also shows that in "Washington, where baseline transmission is much less severe, 80 % of these masks could reduce mortality by 24-65 % (and peak deaths by 5-15%).
While it is clear that wearing a face mask is necessary, variations in general public and community settings have been identified. For example, the U.S. Surgeon General Opinion opposed the procurement of masks for use by healthy people. The explanation for this is to prevent widespread usage of face masks to retain insufficient resources for clinical use in health care settings. Another point found that universal use of face masks in the community has often been discouraged by the claim that face masks do not provide adequate protection against coronavirus infection 6 . However, as has been noted in recent publications, it is reasonable to suggest wearing masks, particularly in crowded and public areas. Generally, most countries during the pandemic have suggested that their citizens wear masks, as described in 6 . For instance, Japan advises people as follows: "The effectiveness of wearing a face mask to protect yourself from virus contraction is thought to be limited. If you wear a face mask in close vicinity, it helps avoid catching droplets coming from others, but if you are in an open-air environment, you don't need to use a face mask 6 . The wearing of a mask may be used in a variety of applications, such as community access control at airports or railway stations. As described above, wearing a face mask has a significant impact on reducing the percentage of infections. Around the same time, certain people would not be compliant with and respecting the safety regulations. Also, it is very difficult to track people manually in the regions. For this purpose, it is necessary to propose an automated face mask detection (FMD) tool to automatically identify someone who does not wear a mask 7 .
The notion is triggered by reading a real-time video via CCTV, and then frame by frame is processed on each face object. After that, a reference model is trained to rely on whether the face is masked or not masked for a future prediction. Figure 1 displays the face samples of the following two classes wearing masks and not wearing masks.
The main purpose of this paper is to detect and increase the detection rate of face masks by using machine-learning techniques. This technique is a histogram orientation gradient (HOG) and a binary classification using the support vector machine (SVM) used in reference 8 ,in which this reference explains a different application (biometric handwritten signature recognition). However, it is similar to the extraction and classification of features, but with a different design and configuration, as well as a different pre-processing that the proposed work needs in order to achieve the results of the challenge. In addition, to show that this machine learning technique can accomplish the task of detecting the face mask professionally and accurately through this article. It should be noted that the proposed pre-processing, HOG, and SVM are considered a contribution knowledge of this article by enhancing detection accuracy compared to state-of-the-art articles. This paper has six parts arranged as follows: Section Two is devoted to a literature review on the identification of facial masks. The design of the research tool methodology is duly elaborated in Section Three. The experiment of this test is then listed in Section Four. The outcome and discussion are discussed in Section Five. Finally, the conclusion is outlined in Section Six, followed by acknowledgment and a list of references.

Literature Review
Previous work-related to face mask detection is critically reviewed in this section. Technically, the identification of the face, whether or not it is wearing a mask, is a process that lies within the field of artificial intelligence. More specifically, machine learning or deep learning. Once the most common stages of machine learning are as follows: input dataset, pre-processing, feature extraction, and decision classification. Accordingly, the existing work on the detection of face masks will be discussed mainly based on the above-mentioned stages. For example, a hybrid approach consisting of locally linear embedding (LLE) with a convolutional neural network (CNN)9, has been used to detect face wearing masks. This work consists of three main modules. First, it combines two pre-trained CNNs to extract candidate facial regions from the input image and represent them with high-dimensional descriptors. After that, the embedding module is implemented to transform such descriptors into a similarity-based descriptor using a locally linear embedding (LLE) algorithm and dictionaries trained on a wide pool of synthesized normal faces, masked faces, and nonfaces. Here, the experiment is conducted using the MAFA dataset with up to 76.4 % accuracy 10 . Another face mask detection work as detailed in 11 used the Simulated Masked Face Dataset (SMFD) to train and test the model. The classification method used here is Transfer Learning from InceptionV3 to classify people who do not wear masks. This approach achieved an accuracy of up to 99.9 % during training and 100 % during testing.
Another recent research on the identification of face masks for the pandemic defense of COVID-19 is clarified in 12 . This research consists of two elements. The first function is extraction using Resnet50 and the second part is designed for the classification process, such as decision tree, support vector machine (SVM), and assembly algorithm. Here, three face masked datasets were used for training and testing, such as the Real-World Masked Face Dataset (RMFD), the Simulated Masked Face Dataset (SMFD), and the Labeled Faces in the Wild (LFW). The best result was recorded using the SVM classifier, which achieved 99.64% test accuracy in RMFD, while it achieved 99.49 % test accuracy in SMFD and 100 % test accuracy in LFW. Another approach for a face mask detector called Retina Face Mask is explained in 13 .Here, the extraction and classifier function is used, consisting of a Pyramid Network function, to fuse high-level semantic information with multi-feature maps. The accuracy obtained is up to 94.5% for recall and 93.4% for precision. Another interesting work for face mask detection is based on the HGL approach for dealing with head pose classification by considering color texture analysis of photographs and line portraits. The HGL method adds the Hchannel of the HSV color space to the face portrait and the grayscale image, and then train the CNN building of the reference model for classification. Here, the MAFA dataset was used to demonstrate the accuracy and efficiency obtained by up to 93.64%, as well as up to 87.17% of the accuracy 14 .
Another deep-learning face mask detection was described as in 15 . The dataset used for the experiment is Real-World-Masked-Face-Dataset (RWMFD) with an accuracy of up to 95%.
As noted in the literature review, several datasets have been created for the training and testing of the model. For the current paper, the proposed methodology of the FMD tool has not been implemented in the literature. Moreover, it can compete with the existing techniques in terms of the accuracy of the detection.

Tool Methodology
The proposed FMD tool depicted in Fig.2, consists of four key separate stages: pre-processing, Viola-Jones face detector 16 , feature extraction, and classification. The product of these four steps is two phases of registration and authentication. The former phase consists of a training activity using the SVM as the process of the enrolment procedures defined in Fig.2 as the SVM reference model. The above called the authentication process or sometimes called the testing process, will capture the queried identity face picture of the device. The same operations that were performed during the enrollment operation should also be applied to the queried face picture. In the classification process, a comparison process is performed between the binary-SVM-model against the queried features vector of the face image. Finally, the decision-making process, based on the configured threshold, determines whether or not the face is wearing a mask. The solid arrow in Fig.2 is referred to as the training (enrollment) path, while the dotted line is referred to as the authentication (testing) path.

Pre-processing
Several image processing techniques are used before the face detector and the feature extraction stage. The explanation for this is to ensure that better image contrast and noise reduction will have a positive effect on the recognition rate. The first procedure is to transform the RGB image into a grayscale image, then the kernel window space media filter [3x3] is used to eliminate noise 17 . After that, the mapping of the intensity values in the grayscale image compares with the new values, which are taken from the saturated bottom 1% and the top 1% of all pixel values in the image. Also, a re-size operation is performed to unify all image sizes, in rows and columns to be [128 x 128]. Some randomly selected users of databases for faces wearing masks and not wearing masks have been shown in Fig. 3, which visualizes the impact of preprocessing operations as images.  Where the original images are depicted in the first columns of Fig.3, gray-scale images are depicted in the second column, median filter images are in the third column and finally, the fourth column contains the images after contrast enhancement.

Viola-Jones Face Detector
The Viola-Jones face detection system is a face detection technique introduced in 2001 by Paul Viola and Michael Jones. This technique requires a complete view of the upright front faces to function properly. The purpose for choosing this face detector due to the characteristics of the algorithm is as follows, robust as it has a high true-positive rate with a low false-positive rate, the real-time during work for face detection, which is adequate for the purpose of this paper to differentiate faces from non-faces (as it is part of this paper objective).In terms of the methodology of the Viola-Jones algorithm, it has four stages that are as follows. Firstly, Haar feature Selection is involved. Because all human faces have some similar properties such as the eye region is darker than the upper-cheeks. The nose bridge region is brighter than the eyes. Therefore, these features may be matched using Haar Features. The second stage is creating an Integral Image. The third stage Adaboost Training, and the fourth stage Cascading Classifiers. More details are explained in 16 .

Feature Extraction (HOG)
Extracting features is the method of choosing the most effective details that can be used to represent the samples for classification. In this paper, the Histogram Oriented Gradient (HOG) algorithm 18 , was selected because of its high ability to represent image samples as a feature vector. HOG extracts local shape information from blocks within an image to support several operations such as tracking, detecting and classifying. The effect of HOG is depicted in Fig.4. In this work, HOG was implemented as the following configurations, the Cell-Size is [8x8] pixels. Then the size of the block is

D-Classifier (SVM)
Once the necessary classes to be determined are two, the Support Vector Machine (SVM) is the right choice to choose because it deals well with the problem of the binary classes. The SVM classifies the feature vector by looking at the best hyperplane that can distinguish all the features of one class from those of the other class. In other words, the optimized SVM hyperplane is the one with the maximum margin between the two groups. The maximum width of the slab parallel to the hyperplane that has no internal data points is Margin, more details with the SVM classification of the HOG features are explained in this work 8 .As shown in Fig.5, the support vectors are the data points closest to the separating hyperplane. Figure 5 also demonstrates these concepts, with + indicating data points of type 1 andindicating data points of type -1 separating a hyperplane with a margin 19 .

Figure 5. Support vector machine (SVM) graphic representation of two classes and two dimensions.
The data for training is a set of points (vectors) x j along with their labels y j . For some dimension d , the x j ∊ R d , and the y j = ±1, accordingly the hyperplane is in Eq.(3).
f(x)=x′β+b=0 (3) where β ∊ R d and b is a real number. There are two classes in this paper, the face without a mask labeled as y=1, and the face wearing a mask labeled as y=-1. In terms of training optimization, Sequential Minimal Optimization (SMO) 20  Testing experiment: Several experiments have been conducted on public databases to test the proposed FMD tool for face mask detection. In this paper, five separate datasets were used to determine the accuracy of the proposed method of the FMD tool, each with a separate number of observations. The databases are listed with their specifications as shown in Table 1. Every database was used to train the model and then test it. Some databases only have non-mask faces, such as GENKI-4k. Conversely, other databases contain only face pictures masked, such as MAFA. Thus, in this paper, we combine them in some experiments to train a database model whose masked faces for one class and other databases whose unmasked faces for the other class. In the setup of the experiment, a reference model is trained to predict two classes as follows, a nonmask face labeled +1, and a masked face labeled -1. The threshold used here to make a decision is 0, to avoid the unbiased separation between-1 and +1 expected scores. The well-known method of machine learning output estimation is referred to as the confusion matrix as set out in Table 2. As it is obvious, in the case of this article, the confusion matrix will have two classes, Non-mask and Masked. Confusion matrix parameters are then extracted based on the proposed FMD tool. In each experiment, the result will be based on the confusion matrix to measure the accuracy metric as described in Eq.(4). The metric extracts a successful percentage of the method. The description of the confusion matrix for machine learning is clarified as in 22,23 . TP is considered to be a counter of the unmasked correctly predicted by the proposed FMD tool for unmasked tested samples, TN is considered to be a counter to masked samples for correctly predicted as masked faces. Although FP is a counter to the masked that is expected to be falsely unmasked. For the fourth factor, FN is also a counter for the unmasked expected as wrongly masked by the proposed FMD tool Table 2 The target is to raise TP and TN parameters as high as possible to achieve better accuracy, and these parameters will be computed in a result Section. Also, there is another classification and prediction method named False Accept Rate (FAR) and False Reject Rate (FRR) as explained and used in 29 . However, the accuracy metric defined in Eq.(4) is sufficient for the proposed detection method.

Results and Discussion:
The outcome of the paper will be seen in this section and split into two types. First, to visualize some known samples according to the proposed method. The second result is the recognition rate or the proposed system with several different configurations for the extraction of the HOG feature and the SVM classifier. As can be seen in the tests, the identification of the face mask is invariant with the color of the skin, mask color, face pose, face with or without the hair. It is also known to be a stable system that has been detected. Figure 6 shows some samples of the face attached to a rectangular label, which will be colored as red or green. The red label will tell you that the face is not wearing a mask, while the green label around the face shows that the face is wearing masks. In Fig.7, another person was captured during the implementation of the proposed video detection system. The aim is to show that the rectangular object around the face is changed accordingly during the on and off the mask. Thus, Fig.7 can be shown as a method of image rendering to represent the transition states from the unmasked face to the masked face. The real-time run is performed by first loading a trained SVM model. The prediction is then made by comparing the loaded SVM model to the tested frame that was a snapshot from the CCTV video. As regards the second type of result report in this paper, the accuracy of the experiments conducted with several configurations of the SVM classifier is reported. Eight experiments have been performed with respect to ISDA 21 optimization preparation. Each one is drawn up in Table 3  In addition, the definition in Table 3 includes the size of the test matrix, the kernel function (either linear or 3rd order polynomial), and the corresponding effective accuracy measured according to Eq.(4) with its confusion matrix parameters for each experiment. The size of the training matrix and the size of the research matrix are shown in Tables 3 and 4. These matrixes are arranged as each of the two groups (non-mask and masked face) includes 50 % non-masked face samples and the other 50 % masked faces. For example, the size of experiment 1 in Table 3 is [700x8100]. This indicates the 350 non-masked samples and another 350 for masked samples. Also, in experiment 3 of the associated MFRD database in Table 3 Similarly, in the research matrix, for example, in Table 3, experiment number 3 has a [360 x 8100] testing matrix, which means that 180 samples must be predicted as mask faces, while the other remaining 180 samples must be non-masked faces, and so on for other experiments. The number of samples selected for training and testing is similar to avoid any bias between the unmasked face and the masked face. As a consequence, the number of training matrixes is shown in Table 3. The same experimental specifications were applied as in Table  3, but using another training optimization called SMO. All details of the eight experiments are given in Table 4. As it is elaborated for experiment 7 in Table 4, the best accuracy is up to 97.5% in the case of the MFRD dataset is used with a polynomial function, with only 9 samples have been incorrectly predicted per 360 tested samples as shown in the confusion matrix of experiment 7 in Table 4. In terms of contrast as a whole, SMO training is better than ISDA training due to the lower error rate derived from the experiments. For example, compared to experiment 7 in both Tables 3 and 4, the accuracy for ISDA and SMO is 96.94% and 97.5% respectively. In addition, it is noted that the polynomial kernel function is stronger than the linear kernel function as seen in Table 3 and Table  4.

Open Access
Baghdad Science Journal  A comparison of the results between the state-ofthe-art and the accuracy of the proposed FMD tool is carried out to validate the proposed FMD tool. As explained in Table 5, three datasets have been used for a comparison operation. About the RWMFD dataset, our accuracy is up to 97%, which is higher than the second reference in No.1 in Table 5. Next, as far as the SMFDB dataset is concerned, the proposed accuracy is up to 100%, which is similar to the proposed work and higher than the second reference in the literature in No.2. Lastly, the proposed accuracy, which is 95%, is better than the two existing MAFA datasets works in No.4. It is worth mentioning that the proposed FMD tool relies on the face detection algorithm namely Viola-Jones to work properly. In other words, if the face has not been identified, the mask and the unmasked face would not function properly. In addition, illumination and brightness adjustment are very critical for the detection process in a real-time implementation. As shown in Fig.8, the face mask detection used is against face variation in direction and scale. The identified face wearing mask based on the proposed FMD tool was therefore carried out in the same way as shown in Fig. 8.

Conclusion:
A Face Mask Detection (FMD) tool is proposed in this paper as it is considered to be dominant research in the era of the COVID-19 pandemic. This is known to be an attempt to reduce outbreaks of the disease and to restrict it. Technically, the proposed FMD tool consists of a pre-processing, Viola-Jonse face detector, then a HOG extraction feature that has been selected in the research and defined as using block size [2x2], and cell size [8 x 8] with a digital image size [128 x 128] to create a feature vector. The length of the vector function is up to 8100 features. After that, binary-SVM was used for training and testing. Experiments have been conducted to evaluate the proposed FMD tool. Accuracy is as follows: 97.00%, 100.0%, 97.50%, 95.00% for RWMFD & GENK1-4k, SMFDB, MFRD, and MAFA & GENK1-4k for the databases, respectively. In future work, combining another extraction function with the HOG could boost the represented feature vector to improve it.