Digits Recognition for Arabic Handwritten through Convolutional Neural Networks, Local Binary Patterns, and Histogram of Oriented Gradients

of


Introduction
The domains of image processing and pattern recognition both make major contributions to the study of handwritten digit recognition (HDR).HDR has become a significant area of research because of the sheer quantity of digitized text and pictures and the growing variety of HDR applications, such as the digitalization of handwritten historical writings for fields as diverse as trade, economics, and medicine.Recently, an increasing number of devices, including smartphones, generate handwritten samples that require recognition and translation into machine code 1 , and many applications, such as reading postal addresses from envelopes and automatically sorting mail; enabling the blind to read; recognize customerfilled forms (such as government forms; insurance claims; and application forms); automating offices; archiving text; and enhancing human-computer interfaces 2 .Consequently, there has been a proliferation of studies on the recognition of handwritten digits, which has motivated researchers to develop precise and effective predictive models.Nevertheless, as illustrated in Fig. 1, due to the complex and irregular properties of handwriting, offline handwritten Arabic (Indian) digit character detection and analysis from images is a significant challenge and available datasets frequently do not account for the diverse writing styles of individuals.
Techniques of artificial intelligence can be utilized to resolve this issue.Convolutional Neural Networks (CNNs) are one technique used to recognize handwritten Arabic (Indian) characters.CNNs are feedforward neural networks that have shown exceptional performance in a variety of difficult artificial intelligence and machine learning tasks and are extensively used in applications such as image classification.The CNN performs two functions; feature extraction and classification which is the primary advantage of using it 3,4 .
Local Binary Pattern (LBP) descriptors and other texture descriptors are robust against varying illumination conditions in images and less complex in computation and thus were used in numerous research studies in various fields.One of the important fields is the handwritten digit recognition systems 5 .
The techniques dependent on counts of occurrences of gradient orientation in the localized portion are employed in extensive applications such as computer vision and image processing for object detection of an image and pattern recognition.The Histograms of Oriented Gradients (HOG) feature descriptors are one of these best techniques, which are used for the recognition of handwritten text as well.The HOG descriptor prioritizes the configuration or form of an object and produces histograms by considering the magnitude and orientation of gradient for image regions 6 .

Related work
for English digits, and small and capital letters.It was found to be the most effective structure among the Patternet and Feedforwardnet architectures, as evidenced by the weighted recognition rate for both digits and letters classes, which was 90.4%.In contrast, the recognition rates for the Patternet and Feedforwardnet architectures were 80.3% and 68.3%, respectively.The utilization of a convolutional neural network (CNN) model for feature extraction and a support vector machine (SVM) for classification was implemented in the context of medical image recognition, as documented in reference 9 .The approach utilized exhibited improved performance, as indicated by the test results which indicated a recognition accuracy of 98.95%.In a study comparing the performance of various recognition systems for multiple font digit recognition, CNN was found to outperform Bag of Features (BoF) with Speeded-up Robust Feature (SURF) and Support Vector Machine (SVM) classifier.The recognition accuracy for CNN was slightly higher at 0.96 compared to BoF 10 .Part of the research 11 showed that CNN could identify handwritten Arabic numbers from historical manuscripts with a precision of 96.06%.The detection of Arabic/Indian handwritten digits was achieved through the presentation of an adapted deep hybrid transfer model, as reported in reference 12 .The proposed model comprises two Convolutional Neural Network (CNN) models, supplemented with Long Short-Term Memory (LSTM) architecture layers and fully connected dense layers.The model incorporated LSTM layers to preserve the features extracted from the CNN component, achieving accuracy levels ranging from approximately 98% to 99.3%.
On the other hand, convolutional neural networks (CNNs) were proposed as a classifier 13 for the detection and classification of Arabic/Indian handwritten numerals, where the first step involves detecting the input numeral as either Arabic or Hindi, and the second step involves detecting the input numeral based on its language, where the recognition rate was got close to 100%.In their study, Gupta et al. 14 employed multiple languages in their research, specifically in the development of a system for recognizing handwritten numerals across different languages.The system was designed to accurately classify each individual numeric digit into 10 distinct classes, without any interference between the languages.Arabic is among the octet of languages employed in this work.This study employed a convolutional neural network (CNN).A collective accuracy of 96.23% was achieved for all eight scripts.Furthermore, Rasool Hasan al 15 employed a pre-trained Convolutional Neural Network (CNN) utilizing the ResNet-34 Model to recognize Arabic handwritten digits.The model achieved a high accuracy rate of 99.6% when tested on the MADBase Arabic handwritten digit dataset, consisting of 60,000 training images and 1,000 testing images.Gaussian mixture models (GMMs) are a form of probabilistic model that can be used for a variety of pattern recognition tasks 16 , such as handwriting recognition.GMMs have been used to model the distribution of handwritten characters and distinguish them from noise in the context of handwriting detection.As demonstrated in 17 , GMMbased approaches are effective at recognizing handwriting in multiple languages.
Other studies focused on the newly developed variations of LBP feature extraction by passing a specific size of sliding window over the image to extract features from local regions; four different sliding window sizes were applied to five LBP-based feature extraction techniques 5 .Alia Karim 18 presented a system for recognizing Arabic and Indian handwritten digits using a KNN classifier and a number of feature extraction techniques, such as upper and lower profile extraction, vertical and horizontal projection, and Discrete Cosine Transform (DCT) with Standard Deviation (DCT_SD); these features were extracted from the image after dividing it into several blocks.Using the ADBase database for Arabic numerals, this work obtained a recognition accuracy of 97.32%.Using an RGB image format as input, the system described in study 19 achieved a 99.3% identification rate using Optical Character Recognition (OCR) applied to handwritten Arabic (Indian) digits; where found to be effective after multiple stages including filtering, thinning, segmentation, feature extraction, and classification.Research paper 20   The concept of LBP is to replace the values of all pixels in an image with another value computed by a threshold that is the value of the center pixel in a block of (N×N) pixels.The threshold is compared with the values of the neighboring pixels around the center pixel.If the threshold is greater than a neighbor's value, a "0" is assigned to that neighbor.Otherwise, a "1" is assigned.Then, following the pixels along a circle, either clockwise or counterclockwise, produces a binary number (an 8digit number if N=3), which is then converted to a decimal number 5

Histogram of Oriented Gradients (HOG)
The Histogram of Oriented Gradients (HOG) descriptor is a feature descriptor that outlines the structure or shape of an object in an image.These descriptors divide the image into small, adjacent, connected regions called cells and calculate a histogram of the distribution of gradient directions or contour orientations, which are the directions of the contours for the pixels inside each cell 22 .Fig. 5 shows some Arabic numbers and their HOG representations

The K Neighbors (KNN)
The K-Nearest Neighbor is a classification method that calculates the probability of each test sample belonging to each class.K represents the number of neighbors from a particular class that is closest to the test data.Finding the optimal value of K can result in good performance.During the classification phase, the distances (Euclidean distance is commonly used) between each training sample and the test sample are calculated.

Overview of the methods
This research employed three distinct techniques to identify handwritten Arabic (Indian) numerals individually The first system is a Convolutional Neural Network (CNN), as depicted in Fig. 6, which illustrates the training flowchart with the CNN's architecture comprised of numerous layers with varying parameter sizes.The network's parameters weight for each layer was randomly initialized during its initialization process.The first layer of the neural network receives an input image with dimensions of (28×28).This input is subsequently transmitted to the first hidden layer, which is a convolutional layer comprising 20 convolution kernels.Each kernel is composed of (9×9) distinct real numbers that are less than one.Subsequently, a rectified linear unit (ReLU) activation function is employed, which selectively excludes negative values.Employing a significant quantity of kernels can enhance the ability to capture a greater number of features.The second hidden layer is pooling which reduces the size of parameters by 50% of its original size, by applying a (2×2) kernel and a move of two being used to obtain the average value.CNN hidden layers even updating network parameters based on error value calculation between network output and actual output.The number of kernels and their size in the CNN method were chosen based on several experiments that were carried out to reach the best results.
The second system employs Local Binary Patterns (LBP) for feature extraction and K-Nearest Neighbor (KNN) for classification, the number of neighbors used to compute the local binary pattern is eight and the radius is one pixel for the circular pattern used to select neighbors for each pixel in the image.The feature vector got from the LBP of (1×59) as real numbers for each image.
The third system utilizes a Histogram of Oriented Gradients (HOG) for feature extraction and KNN for classification.The size of a HOG cell in pixels is (8×8), the number of cells in a block is (2×2) and the number of bins in the orientation histograms is nine while the feature got from HOG of (1×144) as real numbers for each image.
Both LBP and HOG use identical datasets.The flowchart depicted in Fig .7 illustrates the two approaches.The parameter k was assigned a value of five for the K-Nearest Neighbor (KNN) algorithm, and the Euclidean distance metric was employed to compute the distances whether the features vector of Local Binary Patterns (LBP) or features vector of Histogram of Oriented Gradients (HOG).

Results and Discussion
The experiments are performed using Intel(R) Core(TM) i5-3427U CPU @ 1.80GHz 1.80 GHz and 4.00 GB RAM.The software tools for conducting the experiments are MATLAB (version R2017a) programming language.The data set used contains 60000 images for training and 10000 images for testing with size 28×28 are gray type from Kaggle platform.
In the CNN method, a variety of epoch values was employed for training experiments, however, the effect on accuracy was small as seen in Table 1 and Fig. 8, when epoch equal one the recognition for most digits was less than when epochs size equal three, four or five.Likewise, when epoch equals two the recognition for some digits is less than when epochs three, four, or five.While when the epoch was equal to three or four the recognition was getting close to the recognition using the epoch equal to five or six.Therefore, the results for the epoch equal to five were relied upon in this research.
The recognition accuracy obtained from the three methods was included in KNN in all digit recognition, as it is clear from the diagram in Fig. 9 for the same number of training and testing images except two numbers zero and one whereas recognition was convergent in all method.However, when computing the time required for the training in seconds for each method as shown in

Conclusion
The results from the three distinct methods were compared, and it was found that the accuracy levels of the histogram of oriented gradients (HOG) and convolutional neural network (CNN) techniques were equivalent.Conversely, the local binary pattern (LBP) approach demonstrated inferior accuracy in recognizing numerals, with the exception of the number one.The number of features extracted could affect the recognition accuracy, as clear in the LBP method, the number of features was less than HOG

Figure 1 .
Figure 1.Examples of incorrect classification of handwritten numbers.On the left is the real classify while on the right is the mis classify.
presents a study that compares four feature extraction methods for Arabic handwritten text recognition in images.These https://doi.org/10.21123/bsj.2024.9173P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal methods include Gabor, Gabor with GSC, local densities statistics, and the proposed Histogram of Orientated Gradients (HOG) with Edge Histogram Descriptor (EHD).The results indicate that the HOG with the EHD method outperforms the other methods in terms of recognition accuracy.Theoretical background CNN Convolutional neural networks (CNNs) are a form of artificial neural network (ANN) that are dependent on deep learning algorithms and supervised machine learning techniques.They are commonly used in automatic image segmentation, feature extraction and classification and do not require a preprocessing stage as other classification algorithms do 13 .A Convolutional Neural Network is comprised of multiple hidden layers, an input layer, and an output layer.The hidden layers encompass a multitude of layers, including the convolutional layer, which is founded on a mathematical function known as convolution.The aforementioned procedure entails the multiplication of every individual pixel present in the image with each corresponding value that is contained within a weight matrix or filter.The application of convolution on the image's local regions yields products that are denoted as a feature map.This map is capable of capturing diverse facets of the features.The addition of a Pooling layer can lead to a reduction in the dimension of the feature map, resulting in a decrease in computational time and complexity 2 .Pooling also helps prevent overfitting.Using average sum pooling, the arithmetic mean value of units in the subsampled element map is calculated, which removes global features by smoothing the subsampled region, thereby enhancing the performance of network 3 as in Fig .2.The process of feature extraction in a CNN is primarily carried out by the alternation of convolutional and pooling layers, which together generate the feature maps.

Figure 2 . 2 𝑟𝑒𝑙𝑢Figure 3
Figure 2. Pooling process with an average sumThe Softmax function is commonly used as the classification function and the ReLU activation function transforms the summed weighted input from the node into the activation or output for that input after the con-volution operation21 .The Softmax function specifies a discrete probability distribution for K classes, as denoted by Eq. 1, and is often used in the final layer of a neural network.  = exp (  ) ∑ exp (  )  =1 . The equations denoting the Local Binary Pattern (LBP) are represented in Eq. 3:  , (, ) = ∑ (  −   ) 2 ,() = the deferent radius and  denotes the number of neighboring pixels.(, ) represents the coordinates of a pixel in an image,   corresponds to the gray level of the center pixel in a local neighborhood, and   is the gray level of  equally spaced pixels on a circle of radius .Fig. 4 illustrates the procedural stages involved in the production of Local Binary Pattern (LBP) characteristics.

Figure 5 .
Figure 5.Samples of Arabic numbers and theirs HOG representations.

Figure 7 .
Figure 7. Training and testing using LBP / HOG and KNN classification flowcharts

Figure 8 .Figure 9 .
Figure 8.The accuracy from CNN with different epoch : March, 2024 https://doi.org/10.21123/bsj.2024.9173P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal so the recognition accuracy was also less.The large size image may enhance LBP because the number of features depends on the size of the image in LBP and HOG methods but this will cause time complexity.The CNN method demonstrated superior execution time in comparison to the HOG technique.Thus, based on the results obtained, it can be concluded that the Convolutional Neural Network (CNN) outperforms the other two methods in terms of accuracy and speed of execution for gray images with a size of (28×28).Prospective endeavors encompass the recognition of handwritten Arabic letters and the development of diverse neural network configurations that are more efficient.
Table 2 for all ten numbers where the values in the table represent the ratio of the correct recognition to total testing images for each class.Noted that the CNN and HOG with KNN gave convergent and much better results than LBP with https://doi.org/10.21123/bsj.2024.9173P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal

Table 3 ,
and comparing the time required for the two best methods CNN and HOG, CNN has a much shorter training time.The training time of a CNN depends on several factors, one of network architecture, in this work one convolutional layer was used with small-size convolutional filters, one pooling layer, and two connected layers with few parameters and the batch size used was not large, thus led to decreased training time.The histogram of oriented gradients took longer training time because requires the implementation of division and square root operations which have high hardware complexity compared with LBP algorithms, the computational complexity is greatly lower.