A Comparative Analysis of Machine Learning Algorithms for Classification of Diabetes Utilizing Confusion Matrix Analysis

Healthcare experts have been employing machine learning more and more in recent years to enhance patient outcomes and reduce costs. In addition, machine learning has been applied in various areas, including disease diagnosis, patient risk classification, customized treatment suggestions, and drug development. Machine learning algorithms can scrutinize vast quantities of data from electronic health records, medical images, and other sources to identify patterns and make predictions, which can support healthcare professionals and experts in making better-informed decisions, enhancing patient care, and determining a patient's health status. In this regard, the author opted to compare the performance of three algorithms (logistic regression, Adaboost, and naïve bayes) through the correct classification rate for diabetes prediction in order to ensure the effectiveness of accurate diagnosis. The dataset applied in this work is obtained from the Vanderbilt university institutional repository and is publicly available data. The study determined that three algorithms are very effective at prediction. Mainly, logistic regression and Adaboost had a classification rate above 92%, and the naive bayes algorithm achieved a classification rate above 90%.


Introduction
Diabetes is a dangerous disease that occurs as a result of an imbalance between blood sugar and the hormone insulin 1,2 .As a result of this disease, it is not possible to use sugar as it should, so blood sugar begins to spin freely in the blood.After a period of  In general, statistics indicate an increase in people with diabetes.For instance, in the International Diabetes Federation (IDF) annual report in 2021 about the statistics of this disease, as it was mentioned in this report that 537 million adults live with diabetes, and their ages are between 20-97 years; where 1 in 10 people worldwide are affected by this disease.These statistics are expected to increase by 2030 to 643 million people, and to more than 780 million people by 2045.Also, this report stated that this disease causes expenditures of at least 966 billion dollars annually, an increase of 316% over the past fifteen years 13 .In a study by Sharmaet al. 14 , they indicated statistically that the prevalence of type 2 diabetes had doubled in the United Kingdom from 2.39/1000 people in 2000 to 5.32/1000 people in 2013.The region's population is counted in the London Borough of Hackney (United Kingdom).It was found that 40% of them are from the ethnic minority group, as they discovered the highest prevalence of type 2 diabetes in the black and Asian groups [15][16][17] .Obesity is one of the most common causes of diabetes.Thus, physicians and healthcare workers advise people with this disease to constantly seek to lose excess weight in order to achieve a cure for diabetes or not to develop this dangerous disease for people who are not infected and have a hefty weight 18 .In addition, there are surgeries that help patients lose weight, and these operations have been widespread recently in many countries around the world.These are called bariatric surgeries.A study carried out by Rebelos et al. 19 evaluated the importance of these surgeries and their influence on weight loss for people with type 2 diabetes.The data of 312 obese people who underwent surgeries were analyzed, and their body weight was monitored for 1, 2, 3, 4 and 5 years in regular ambulatory visits to them and; the numbers are as follows 269, 312, 210, 151, 105, in each year, respectively.Their study found that there is a slight effect on reducing the level of disease after bariatric surgery.Understanding the mechanisms that lead to weight loss can prevent this disease.Abbott invented a device that monitors blood sugar levels in people with diabetes without needing backup finger prick tests and is approved by the US Food and Drug Administration 20 .There are many ways to monitor the level of sugar, glucose and insulin, as well as these mechanisms are linked with the smartphone through applications or personal computer Fig 2.
Companies are constantly seeking to develop these devices and move away from traditional methods.Physicians and healthcare workers seek to take benefit of the capabilities of artificial intelligence techniques in diagnosing medical datasets and making a proper decision regarding the patient's health condition 21,22 .Artificial intelligence techniques have proven to have an influential role in health services in recent years due to insufficient human resources and the increased workload.However, the areas of application of these techniques in healthcare are very hourly and have the ability to analyze extensive data.Governments and organizations aspire to grow these techniques and rely on them in determining and making health administrative decisions.In addition, these techniques study datasets' behaviors and prepare performance reports for family medicine, hospitalization, surgery, diagnosis, prediction, etc.These techniques are gaining popularity and the satisfaction of many healthcare workers day by day.
Considerable studies are constantly being conducted on the practices of these techniques and their effects in diagnosing medical datasets such as heart disease, COVID-19, cancer diseases, etc [23][24][25][26][27] .Concisely, artificial intelligence contributes to the development of health and administrative services, reducing the costs of surgeries and reducing human-caused errors.
As mentioned above, one of the most important branches of artificial intelligence, which is machine learning, is used in the implementation of this work.Machine learning is an important part of artificial intelligence because it has a set of algorithms that have the ability to predict and diagnose all medical data and study its behavior and extract important features from it.The following are contributions to the current study: -Obtaining a publicly available data set from Vanderbilt university, which is stored in the Vanderbilt biostatistics datasets site.This set includes data from 403 people diagnosed with and confirmed to have diabetes.A glycosylated hemoglobin >7.0 is often taken as a positive diagnosis of diabetes.-In order to analyze this dataset, machine learning algorithms were used that have the ability to deal with this type of data.Notably, picked three different algorithmslogistic regression, Adaboost, and naïve Bayesbased on their practicality for this task and their established performance in similar contexts.This is to make a comparison between these algorithms through prediction practices and obtain the most profitable performance through metrics: accuracy, recall, specificity, precision, F1-score, MCC, and AUC.-Checking the performance of the applied models during the cross-validation value.Also, compare the practices of these algorithms through the effects of correct prediction and incorrect prediction of the testing data.
The rest of this article is organized as follows: Section Two briefly reviews some literature through the applied algorithms and the accuracy effect obtained.Section Three is divided into two parts; the first part discusses the dataset that is employed in this work is discussed, and the second part explains the algorithms that have been applied and their significance along with their major equations.Section Four describes the evaluation metrics used, evaluates the performance of the three models, mentions the results, and shows the most profitable performance.This article concludes with Section 5 and the last, with a set of conclusions about the behavior of these algorithms and conducting future work.

Literature Survey
This section reviews the literature on using machine learning algorithms to predict diabetes mellitus through dataset analysis.The review will concentrate on the algorithms employed and the consequent accuracy metrics.By highlighting these factors, we hope to gain a better understanding of the effectiveness of these algorithms in predicting diabetes mellitus.Kumar et al., 28  The dataset for this study includes more than 760 patients with nine traits taken from the UCI machine learning repository.These researchers were able to highlight the performance of the algorithms and get high accuracy of more than 88% achieved by the Artificial neural network algorithm.In addition, this study includes a complete analysis of the Pima Indian Diabetes (PID) dataset.Another study tested the ability of the learning algorithms to predict diabetes.Zou et al. 32 apply the application of three (decision tree, random forest and neural network) algorithms in predicting diabetes from a data group of more than 164,000 instances and 14 attributes collected from the hospital physical examination data in Luzhou, China.They concluded that the best performance is the random forest by achieving an accuracy of more than 80%.There is also literature on the use of machine learning to predict diabetes through a set of medical images, where Math and Fatimato conducted a study 33 using adaptive machine learning techniques to classify diabetic retinopathy images.They proposed a segment-based learning approach for detecting diabetic retinopathy and trained a convolutional neural network (CNN) to work with the dataset at a segment level.The results showed that the ROC curve exceeded 96%, and also more than 96% for both sensitivity and privacy, respectively.Other literature is summarized and compared with the current study in Section 4.

Experimental Setup Intro to dataset
In this article, a diabetic dataset is collected from the Vanderbilt university institutional repository.towards blue.Machine learning algorithms are generally distinguished in their ability to deal with complex, high-dimensional data sets while integrating features and interactions among them to extract accurate results and determine the patient's condition.In addition, the effectiveness of machine learning algorithms in classifying diabetes depends on several factors, including features, data quality, and evaluation metrics employed.In other words, to get higher evaluation metrics effects, the data quality must be high with more features in order to train the machine learning algorithms sufficiently and improve performance.

The algorithms
Machine learning is commonly used to classify medical data as it can effectively manage vast amounts of data and extract pertinent information from it [37][38][39][40]  training and testing processes and assess the performance of these algorithms.Directly, these algorithms will be described in a simplified manner as follows:

Logistic Regression algorithm
Logistic regression is one of the most popular machine learning algorithms concerned with binary classification.This algorithm involves predicting a binary outcome based on one or more input variables, which is why it was employed in this work.
In diagnosing medical data, this algorithm is utilized to predict whether a patient will have a medical condition based on clinical variables and related variables.In other words, this algorithm is to model the probability of a binary outcome (e.g., the presence or absence of a disease) utilizing a logistic function.The logistic function takes a linear combination of input variables and plots it as a probability value between 0 and 1.This probability value can then be used to predict the binary outcome.Before employing the logistic regression algorithm on the medical dataset, specific preparation and processing steps must be carried out.These include addressing any missing values, standardizing input variables, and converting categorical variables into numerical ones.Once the data is prepared, this algorithm can be utilized to train on this data, and then the performance of the algorithm is measured based on a set of metrics such as accuracy, precision, recall, and F1-score.This algorithm assumes that the input variables are linearly related to the output variable and that the relationship is constant across the range of input values.In the event of any deviation from this assumption, the algorithm may behave more like a decision tree algorithm or neural network.Furthermore, this algorithm presupposes that the data are independent and similarly distributed, which may not be true for all medical datasets.During the work, the sigmoid function is used to convert the dot product of the input features and model weights () into an expected predicted value between 0 and 1. Eq. 1 represents the sigmoid function.While Eqs 2 and 3 are used to find the cost function and parameter update, respectively.them with less weight.Besides, the weak classifier is trained on the updated dataset, and this strategy is repeated until all the weak classifiers are trained.Also, this algorithm aims to merge weak classifiers with one robust classifier by giving weights to each weak classifier based on its performance situation.Algorithm 1 illustrates the steps of executing this algorithm with an explanation.This algorithm is applied in many medical applications due to its excellent ability to predict the probability of developing diseases, such as heart disease, breast cancer diagnosis, and Alzheimer's disease detection.
That is why it is used in this work in the classification of diabetes.As with any machine learning algorithm, it is crucial to carefully evaluate the performance of the model on the specific medical dataset used, and to interpret the results in the context of medical knowledge and experience.

Naïve Bayes algorithm
One of the most popular machine learning algorithms that is employed in data classification tasks, including the classification of medical datasets.This algorithm is based on Bayes' theorem, which is a fundamental concept in probability theory.Within the work steps, this algorithm supposes that the features are present in the dataset and are independent of each other, so this algorithm has been named naïve.The algorithm has demonstrated excellent performance across multiple applications and is extensively utilized in various scientific investigations, primarily for classifying medical datasets.Initially, this algorithm requires working on a disaggregated medical records dataset.These records contain a set of features such as (age, gender, blood pressure, symptoms, etc.) and a corresponding label indicating the medical condition (such as disease, non-disease, or specific disease type).Next, the prior probability for each category is computed.This possibility is to show the occurrence of each category before looking at any data.For instance, to clarify, if a dataset contains 100 medical records and 30 of them are primarily classified as people with a particular disease, the prior probability for this category is 0.3.The next step is to calculate the conditional probability for each feature in each category.This step is done by calculating the frequency of each feature in the records to belong to each category.To obtain the conditional probability, is accomplished by dividing this frequency by the total number of records with the category label.After getting the prior and conditional probabilities for each category and feature in the dataset, each class is calculated with a serious set of features.To achieve this, the method entails multiplying the category's prior probability with the conditional probabilities of every characteristic within that category.Subsequently, the result is standardized to guarantee that the probabilities sum up to 1.During the last stage, the newly generated record can be allocated to the category with the highest likelihood.Practically, this algorithm is proficient in the medical dataset's classification process.Regrettably, they do have limitations, particularly when the assumption of feature independence is not satisfied or when there are complex interactions between features.In these cases, more advanced machine learning techniques may be required.In this scenario, this algorithm has proven to have an outstanding performance in classifying small data.Eq 4 computes the probability of a class given an instance or data point.Probability calculations are accomplished by Eq 5.  is the total number of features.

𝑃(𝑐|𝑥
The main benefit of the above equations is to train the algorithm on the dataset and make predictions about new data points.This algorithm calculates the probability of each class for a unique data point and gives the class with the highest probability as the predicted class.

Results and Discussion
This section will cover the effects that were achieved by employing machine learning algorithms, specifically logistic regression, Adaboost, and naïve bayes.Subsequently, the performance of these algorithms will be considered and compared.The Python environment is used to implement this work, as it is a well-liked programming language in the field of machine learning.This language has a set of libraries (including TensorFlow, Keras, PyTorch, and Scikit-learn) that provide pre-built and customizable models for a wide range of machine learning tasks.Moreover, this language provides a wide range of tools, simple syntax, and crossplatform compatibility, allowing developers to create powerful machine learning applications quickly and efficiently.Using algorithms, the dataset is divided into 70% training data and 30% as testing data, with learning performed for each algorithm with no scaling of the data.The compilation and validation process of each algorithm is repeated 10 times and the average values are taken into account.All algorithms using the training data utilize a crossvalidation value ( = 5).The decision threshold is set at 0.5, which allows setting the sigmoidal output to classify the data into two classes, and the learning rate () is 0.1.The training of the model stopped when the highest accuracy is achieved.After  Classical mathematical formulas are utilized to consider the performance and predictive abilities of the algorithms by assessing their accuracy, recall, specificity, precision, F1-score, and MCC (See formulas 6 to 11).These formulas are essential tools for evaluating the performance of machine learning algorithms because they provide a quantitative method for evaluating the quality of predictions and comparing performance to determine which works better.Thus, the use of these formulas is crucial to ensure that the algorithms are both dependable and practical in their respective applications.Moreover, the AUC-ROC (Area Under the Receiver Operating Characteristic Curve) metric is used to measure the algorithm's ability to distinguish between positive and negative classes through the true positive rate (TPR) against the false positive rate (FPR) at different threshold values.A high score on this metric means that the algorithm performs satisfactorily in distinguishing between positive and negative classes, while a low score indicates inadequate performance.After conducting the tests, it was noticed that the algorithms' practices lead to excellent performance through the accuracy metric, which is the motivation for evaluating the performance of machine learning algorithms.The logistic regression algorithm demonstrated accurate classification by achieving an accuracy rate of 92.5%.Similarly, the AdaBoost algorithm achieved an accuracy rate of 92.5%, which was very close to the logistic regression algorithm.
On the other hand, the Naïve Bayes algorithm exhibited very good accuracy, surpassing 90.5%, which is the best score.However, its performance did not match that of the logistic regression and AdaBoost algorithms.Logistic regression was able to correctly predict 108 individuals out of 117 of the testing data.The Naïve Bayes algorithm correctly predicted 105 individuals out of 117 of the testing data.The AdaBoost algorithm achieved correct predictions for 106 individuals out of 117 in the testing data, with a maximum of ten estimators chosen for the algorithm.The algorithms are trained to stop when the highest accuracy was attained.
Regarding the algorithms' performance based on the AUC metric, the logistic regression algorithm displayed remarkable capability in distinguishing between classes.It earned a score of 85%, which was the highest among the algorithms based on this metric.The AdaBoost algorithm also proved its proficiency to distinguish between categories as it achieved a score of 80%.On the other hand, the naïve bayes algorithm exhibited the lowest performance in this regard, with a score of 70%.Now, all algorithm results will be displayed in the form of tables and figures.Figs 10 to 12 exhibit each algorithm's confusion matrix and the AUC.Table 2 provides an overview of the performance evaluation of all algorithms employed in this study, whereas   Through the table above, it is clear that the practices of the linear regression algorithm give effects in all metrics that are the highest among all other algorithms except for the recall metric, which denotes the percentage of data samples that are perfectly recognized as belonging to the positive category, as it achieved a result of more than 95 %, which is a remarkable effect.Still, it is not the highest compared to the Naïve Bayes algorithm, which gained a recall level of more than 97%.

Conclusion
The primary purpose of this article is to reach the performance of the three algorithms and to determine the most suitable performance.The effects are reached based on two main metrics accuracy and AUC.The results indicate that the logistic regression algorithm achieved the highest level of accuracy.At the same time, the AdaBoost algorithm also showed a powerful performance with accuracy that is comparable to the logistic regression algorithm.As for the logistic regression algorithm, it had a high ability to distinguish between categories through the AUC metric, which is considered the highest compared to other algorithms.However, the accuracy metric remains the most important in machine learning, which limits the algorithm's ability to make correct predictions in patient data and assist in decision-making.Based on this analysis, it can be concluded that the logistic regression algorithm delivers the most acceptable level of performance.Indeed, all metrics analyzed of the logistic regression algorithm in this study are the most elevated compared to other algorithms.In contrast, the naïve Bayes algorithm showed the poorest performance among the algorithms, despite its high recall.This is evident from its low accuracy and AUC, indicating its inflexible ability to differentiate between classes.In the future, other models will be used to study their behavior in analyzing diabetic datasets and reaching performance between them.

Published
Online First: October, 2023 https://dx.doi.org/10.21123/bsj.2023.9010P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal it as the first stage of diabetes, and the symptoms are unclear.Therefore, recognizing this disease at this stage and taking precautions will facilitate treatment and prevent the disease from developing and spreading in the body.The most notable signs of the onset of this stage and the most well-known are excess weight, continuous sweating, sleep disturbances, and the constant desire to eat sweets.

Figure 1 .
Figure 1.The eye of a healthy person vs. the eye of a person with diabetes 3 .

Figure 3 .
Figure 3. Histogram of some Variables

Figure 4 .
Figure 4. Correlation between the Glucose and cholesterol.

Figure 5 .
Figure 5.The correlations matrix between all variables.

Figure 6 .
Figure 6.Logistic regression in binary dataset separation

Algorithm 1 : 2 :-
AdaBoost stepsStep 1: Input: Dataset  = { 1 ,  2 , … . .,   }, Step Initialize weights: set the weights of all instances in the training set to () = 1  ⁄ , where  is the number of instances in the dataset.Step 3: For  = 1 → , where  is the no. of weak classifiers to be combined:-Train a weak classifier: Train a weak classifier ℎ() on the training set using the current weights.-Compute the error: Compute the weighted error of the weak classifier on the training dataset, () = (() * (()!= ℎ()(()))) where () is the true label of the  − ℎ instance in the dataset, () is the feature vector of the  − ℎ instance, and () is the indicator function that returns 1 if the condition in the brackets is true and zero otherwise.-Compute the classifier weight: Compute the weight of the current classifier, ℎ() = 1 2 * log ((1 − ()/())) Update the instance weights: Update the weights of all instances in the training dataset, () = () * exp (−ℎ() * () * ℎ()(()))/ where  is a normalization factor that ensures that the weights sum up to 1. Step 4: Output→ Compute the final classifier: Compute the final classifier as a weighted combination of the weak classifiers, () = ((ℎ() * ℎ()())).Step 5:  (the outcomes) Notice: In the above equations, ℎ()() represents the prediction of the  − ℎ weak classifier on the input instance , and () represents the prediction of the final classifier on the input instance .The sign function returns +1 if the argument is positive or zero, and -1 otherwise.
processing the dataset to eliminate duplicates and missing values, only 390 records remained, split into 273 records (70% training data) and 117 records (30% testing data) Fig.7,8 illustrates a map of the working mechanism in implementation.In addition, Fig9illustrates that the confusion matrix was employed to identify four fundamental metrics for each algorithm.The primary objective of this matrix is to evaluate the algorithms' predictive capabilities.
In a study executed by researchers Khanam and Foo 31 on the application of machine learning algorithms in the prediction of diabetes.

Table 3
displays the correct and incorrect predictions of each algorithm.Finally, Fig 13 illustrates a comparison of the algorithms' performance based on Accuracy and AUC metrics.Table4compares the current study findings with a set of published literature.