Prediction of Thyroid Classes Using Feature Selection of AEHOA Based CNN Model for Healthy Lifestyle

People with underactive thyroids frequently endure severe symptoms. Correct classification and machine learning substantially improve thyroid disease diagnosis. This precise classification will impact the timely delivery of care to the patients. Although diagnostic techniques exist, they frequently seek binary categorization, use insufficiently big datasets, and lack confirmation of their conclusions. The focus of current approaches is on model optimisation, whereas feature engineering is neglected. This research presents the Adaptive Elephant Herd Optimisation Algorithm (AEHOA) model for selecting optimal attributes in order to circumvent these limitations. At first, employ a method called the Synthetic Minority Over-sampling Technique (SMOTE) to even out the data. Finally, the parameters of the AEHOA model are fed into a Convolutional Neural Network (CNN) to categorise data and enhance prediction. The accuracy of classification predictions was also increased by tweaking the dataset. Both datasets were put through a categorization process for a more precise comparison of results.


Introduction
The healthcare sector is making use of computational biology developments by accumulating patient data for the sake of illness prediction.There are a sum of available tools for early illness diagnosis 1 .Intelligent requests, i.e., the evidence of medical knowledge, are not readily available to collect the necessary sets of data for illness analysis 2 .Recently, however, a method known as Machine Learning (ML)  optimisation has emerged, making significant contributions to the prediction and resolution of nonlinear and complicated problems.Maximum weight is given to characteristics from many datasets that may be selected in any illness detection strategy and readily categorise in healthy people 3 .Instead, a healthy individual may get unneeded therapy if they were incorrectly labelled as having a disease.Therefore, it is of utmost importance to accurately forecast any disorders in addition to thyroid 4 .
The thyroid gland is an endocrine gland located in the neck.It develops in the human neck below the Adam's apple and helps the thyroid secrete https://doi.org/10.21123/bsj.2024.10547P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal hormones, which in turn regulates protein synthesis and metabolic rate 5 .Thyroid hormones help regulate several aspects of metabolic rate 6 , including heart rate and energy expenditure.Thyroid hormones are secreted by the thyroid gland and help regulate metabolism.Thyroid glands actively release the thyroid hormones triiodothyronine (T3) and levothyroxine (T4) 7 .These hormones play a significant role in the manufacturing process and in the supervisory framework as a whole by controlling core body temperature.The thyroid glands ordinarily generate two active hormones, T4 (also known as thyroxin) and T3 8 .These hormones have crucial roles in the body's energy storage, temperature control, communication, and protein management.Deficiencies in T3 and T4 (two thyroid hormones) are associated with iodine deficiency, which is considered a fundamental component of the thyroid glands 9 .Both too little and too much of the thyroid hormones have a role in the development of hypothyroidism and hyperthyroidism, respectively.Several pathologies can lead to either underactive or overactive thyroids.Different medications have different uses 10 .Thyroid surgery puts patients at risk for iodine insufficiency, enzyme deficiency, ionising radiation exposure, and continuing thyroid pain.
Ultrasound imaging technology has allowed for the accurate diagnosis of a variety of thyroid disorders 11,12 .Clinical diagnosis using ultrasound involves collecting the patient's Thyroid Ultrasound Standard Deep Learning (DL) is typically broken down into two stages: first, the methodology, is used to train the sample images and the extract network is used to either classify or identify the sample images 13 .
In order to categorise data for thyroid illness, researchers prioritised the feature selection strategy for eliminating superfluous characteristics.At first, the SMOTE method is used to rectify the unbalanced data.Next, the AEHOA model chooses the pertinent characteristics, and finally, CNN is employed to classify the data.The remaining sections of the paper are as shadows: This paper is organised as follows: Section 2 provides an impression of the literature; Section 3 details the suggested model; Section 4 discusses the experimental analysis; and Section 5 draws conclusions and suggests directions for further research.

Related works
Different machine learning (ML) procedures, including a scaling technique, an oversampling strategy, and other feature selection techniques, have been developed by Sultana and Islam 14 to provide a useful framework for classifying TD.In addition, key TD risk variables were identified using this methodology.The dataset used in this study was obtained from a database maintained by the University of California, Irvine (UCI).After that, the preprocessing step saw the application of SMOTE to fix the uneven classes and the robust scaling method to normalise the data.It employed the Boruta, RFE, and LASSO methods to narrow down the characteristics that were most relevant to the problem at hand.It was used six different machine learning classifiers to train the model.A 5-fold CV was used to analyse the models.The algorithms were evaluated using a variety of performance indicators.Using the RF classifier, the system was able to produce 99% correct results.The suggested method The suggested method would help doctors and patients alike categorise TD and get insight into its related risk factors.
To tackle these problems, Yu et al. 15  Patients with thyroid illness may be classified into several groups using a multiclass classification model proposed by Alnaggar et al. 16 , which relies on XGBoost optimisation.The primary contributions are (i) the proposal of a Multiclass-Classification for the goal of diagnosing three distinct thyroid disorders, and (ii) the improvement in the accuracy of feature selection for classification using the row dataset., and (iv) improve upon the results of previously conducted research.Thyroid illness data from the UCI is used to teach and evaluate XGBoost.
In addition, constructed the model using hyperparameter optimisation to attain and compare the greatest possible accuracy score.The findings demonstrate that, when compared to the state-of-theart models, the optimised XGBoost outperformed them by a significant margin (99% accuracy).
A unique transfer learning approach using a distant domain high-level feature fusion (DHFF) model is suggested by Tang et al. 17 .As a result, the model may acquire more relevant transfer information while avoiding unnecessary feature fusion thanks to a smaller distribution gap between the source and destination domains.Multiple studies using both datasets verify the DHFF.Based on the findings, DHFF's classification accuracy with auxiliary source domains may reach 88.92%, which is an improvement of up to 8% over prior transfer and remote transfer methods.
A crucial role in Thyroid illness detection is played by a framework design proposed by Sinha et al. 18 that uses LightGBM, Sequential Backward Selection (SBS), and a metaheuristic approach called Whale Optimisation (WO).The primary purpose of this study is to deliver a method that is both extremely precise and logical for identifying human thyroid problem.Despite the impressive outcomes of the several methods used to thyroid data sets, our literature review shows that the data actually utilised for illness identification is redundant, unpredictable, and lacking feature values.Detecting thyroid abnormalities and starting appropriate treatment as soon as possible is the goal of the proposed work, which would involve developing an expert advising system based on the Opti-LightGBM architecture.
The suggested Opti-LightGBM model beats many state-of-the-art comparable models and achieves an accuracy of 99.75% on the Thyroid dataset.
An optimised convolutional neural network model is provided by Srivastava and Kumar 19 for the detection of thyroid nodules through a sum of deep learning strategies, including the geometry group-16.
This study takes into account a total of 295 available and 654 gathered thyroid ultrasound datasets.Data from 1475 publicly available and 3270 privately acquired thyroid ultrasound datasets are used to test the suggested model.Through experimentation, it was found the optimal values for the learning rate and drop out factor to improve the models' overall efficiency.Experiment-I results for the projected model on the public dataset show an accuracy of 93.75 percent, sensitivity of 94.6 percent, specificity of 92.5 percent, and f-measure of 94.0 percent; experiment-II results for the collected dataset show an accuracy of 96.89 percent, sensitivity of 97.80 percent, specificity of 94.7 percent, and f-measure of 97.2 percent.The suggested model outperforms state-of-the-art models on in terms of accuracy, sensitivity, specificity, and f-measure by margins of (4.57%, 7.84%), (5.06%, 8.24%), (4.43%, 6.63%), and (4.66%, 7.83%), respectively.

Proposed System
This research paper tourist attractions the importance of accurate classification of heart disease.

A. Dataset brief description
The experiment makes use of the thyroid dataset from the UCI ML repository.From the Garavan Institute in Sydney, Australia comes this contribution from Ross Quinlan 20 .There are a total of 3772 entries in the dataset, 2800 of which training and 972 for testing.The dataset contains a total of 29 characteristics, the last of which is a prognosis of the illness.There are 7 numerical characteristics and 22 category ones.The terms "age", "sex", "pregnancy", "goitre", "tumour", and "hypopituitarism" are just 11 of the many clinical variables that might be assessed.The collection also includes six test findings labelled https://doi.org/10.21123/bsj.2024.10547P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal "TSH," "T3," "TT4", "T4U," "free thyroxine index," and "TBG."The model's process is depicted in Fig. 1.

B. Preprocessing
The column means are substituted for missing data in the dataset.In a highly unbalanced dataset, there is a large discrepancy between the amounts of data in each category.An unbalanced dataset is useless.It is far more expensive to incorrectly diagnose a cancer patient as healthy than it is to incorrectly diagnose a healthy patient as having cancer in medical analysis.
A false negative error might result in the loss of a human life, making it significantly costlier than a false positive error.
The collection of positive tuples that have been provided.The SMOTE technique is used to equalise the sum of datasets used in this investigation.In order to make sure there are an equal sum of instances, the number of thyroid cases is inflated.Table 1 displays the number of classes and the distribution of those classes in both the original dataset and the dataset generated after using SMOTE. The elephants usually alone.
 Each clan is directed by its eldest matriarch.
In a herd of elephants, the matriarchs accept the best option while the males' positioning reveals the worst.There are j different elephant families.Under Em's watchful eye, the members of clan c always make their next step based on the generation's greatest fitness quotient.The Eq. 1 describes this operation.

……….3
Here, 1 ≤ d ≤ D u_(E_m) represents the total sum of elephants in clan c, whereas D represents the total dimension of space.
The male elephants that wander away from their herd are put to use in scientific models.Some of the least suited elephants in each clan c are reassigned to new jobs, as stated in Eq. 4.
Crossover and mutation procedures are achieved during the evaluation of elephant locations to further optimise the system.Specifically, a crossover with two points is used.In this technique, pick two locations on each set of paternal chromosomes.The genes between these two locations are swapped out during reproduction, resulting in the offspring's chromosomes.The evaluation of these locations is given by Eq. 5 and Eq. 6.

𝑥1 =
| ,  | 3 ……….5 The mutation is carried out by replacing several genes on each chromosome with new ones.Genes that have been randomly produced with no chromosomal repeat are the ones that have been switched.To improve the fitness value, this procedure is repeated.
The pseudocode of the AEHO procedure is given as underneath:

Classification
Starting with the pre-processed data, the larger one is assembled and then the dataset is resampled to fit into the defined size by using AEHOA model.Preprocessed data is divided into training (80%) and testing data images (20%).In order to create a trained perfect, the ideal is first applied to training data.
In the projected effort, the model's Score are tested using a variety of standard metrics.By analysing selected data, the suggested network can categorise thyroid-infected patients.Our design calls for the use of a deep neural network with multiple layers and filtering capabilities.Each of the nine LeakyReLU layers, along with the other five dense layers, one flattening layer, and four dropout layers, is separated by three convolutional layers.A LeakyReLU operation is performed on each convolutional layer before it is sent to the layer.In this architecture, LeakyReLu is used as an activation function.The suggested design uses 2 x 2 and 3 x 3 kernels.
Increasing the number of filters to 32, 64, and 256 is done on a regular basis.In the end, the activation layer generates the outputs in this model.Number of Epoch is 100, Number of Batch is 25, Optimizer used is Adamax and Model is used as "sequential"

Results and Discussion
All tests were conducted on a 3.6 GHz Intel Core i9with 16 GB of RAM and an NVIDIA GTX 2080Ti graphics dispensation unit.In addition, it was duse the accelerated computing resources of CUDA10.1 and cuDNN7.6.5 in the deep learning framework Pytorch.

A. Evaluation Metrics
Accuracy: "ratio of the observation of exactly predicted to the whole observations".This is exposed in Eq. 7.

𝑆𝑝 =
……….9 Precision: "the ratio of positive observations that are predicted exactly to the total number of observations that are positively predicted".   2 shows the graphical representations of the various models discussed.The comparative analysis of the different models used are shown in Fig. 3. Also, the error analysis of the proposed model is represented in Fig. 4, showing the proposed model to be more efficient.

Conclusion
Thyroid illness identification is becoming increasingly urgent.Neither the models nor the dataset used to evaluate them have been thoroughly verified.The research presented here addresses these drawbacks by suggesting a method that makes use of feature selection with a deep learning model.Future trends in the medical use of AI include the autonomous measurement of thyroid parameters.In this study, a novel DL procedure was used to predict the identification and categorization of thyroid illness.After SMOTE is used to normalise the data in preparation for the AEHOA model's optimal feature selection, the data is ready for analysis.The classification accuracy of both the data is then improved through the application of CNN.The results show that the AEHOA model is most effective when combined with characteristics picked using a convolutional neural network (CNN).Due to their condensed computational complexity, CNNs are promising for use in the prediction of thyroid illness.These findings are supported by a 10-fold cross-validation test.When practises, the suggested method shows considerable improvement in performance.In the future, plan to expand our work to a classification job, where it will generate a model can properly forecast the pixel tags for a thyroid condition using only a little quantity of training statistics.

Authors' Declaration
-Conflicts of Interest: None.
-We hereby confirm that all the Figures and Tables in the manuscript are ours.Furthermore, any Figures and images, that are not ours, have been included with the necessary permission for republication, which is attached to the manuscript.
-No animal studies are present in the manuscript.
-No human studies are present in the manuscript.
-Ethical Clearance: The project was approved by the local ethical committee at University of Technology and Applied Sciences, Oman.

Figure 1 .
Figure 1.Working flow of the proposed model.The thyroid dataset includes numerous missing values and is severely skewed towards one of four classifications.Out of the total 3772 cases in the dataset, 3481 (92.3%) are in the negative category, meaning they are normal.Another 194 (5.1%) are in the compensated-hypothyroid category, meaning they are hyperthyroid.Another 95 (2.5%) are in the primary-hypothyroid category, meaning they are hypothyroid.Finally, 2 (0.05%) are in the secondaryhypothyroid category, meaning they are underactive thyroids.

C
. Feature Selection using AEHOA AEHO comprises the subsequent expectations:  The populace of clans.Every single clan comprises an exact sum of elephants.
the ratio of count of false positive predictions to the entire count of negative predictions".=    +  ……….11 https://doi.org/10.21123/bsj.2024.10547P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal FNR: "the proportion of positives which yield negative test outcomes with the test".probability that subjects with a negative screening test truly don't have the disease".the number of false positives in all of the rejected hypotheses". =     +  ……….14 F1 score: It is distinct as the "harmonic mean between precision and recall.It is used as a statistical measure to rate performance".is a "correlation coefficient computed by four values". =   ×  −  ×  √(  +  )(  +  )(  +  )(  +  ) ……….16

Figure 2 .
Figure 2. Graphical Representation of various models.

R
. J., P. K. P., D. M.G and A. S. Z. J. contributed to the design and implementation of the research, to the analysis of the results and to the writing of the manuscript.

Table 1 . Data Balancing the Thyroid Dataset with SMOTE Procedure Dataset No. of classes Total records
SMOTE was used to collect a total of 6795 records, 3314 (48.77%) of which were negative; 760 were classified as primary_hypothyroid, 1584 as secondary_hypothyroid, and 970 as compensated_hypothyroid.As a result, it approximates a ratio of 51:49, making for a dataset; the same ratio is also used to separate the training and test sets.

Table 4 . Analysis of Various Techniques for Thyroid Disease Prediction
After the FPR metrics of LR model https://doi.org/10.21123/bsj.2024.10547P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal achieved as 0.18421 and then MLP model attained as 0.10526 and then the AE model attained as 0.89655 and further DBN model attained as 0.80645 model attained as the metrics value as 0.078947 correspondingly.After the MCC metrics of LR model achieved as 0.55436 and then MLP model attained as 0.68803 and then the AE model attained as 0.77612 and further DBN model attained as 0.72464 model attained as the metrics value as 0.76555 correspondingly.Fig.