Exploring the Challenges of Diagnosing Thyroid Disease with Imbalanced Data and Machine Learning: A Systematic Literature Review

Thyroid disease is a common disease affecting millions worldwide. Early diagnosis and treatment of thyroid disease can help prevent more serious complications and improve long-term health outcomes. However


Introduction
The thyroid gland, which produces hormones that regulate metabolism, is affected by thyroid disease.Nevertheless, it is the most common type of cancer that affects the endocrine system 1 .In recent decades, there has been a rise in the prevalence of thyroid cancer, particularly in countries with favorable, managing this condition may lead to prolonged morbidity due to the elevated risk of recurrence and potential surgical complications 3 .There are two main categories of thyroid disease: hypothyroidism, characterized by a decrease in thyroid hormones and symptoms such as weight gain, fatigue, and constipation, and hyperthyroidism, characterized by an excess of thyroid hormones and symptoms such as irritability, weight loss, and tremors.A combination of environmental factors, genetic factors, and the interaction between these two can cause thyroid disease.Environmental triggers and genetics are potential causes that can contribute to the development of thyroid disease, according to the Mayo Clinic 4 .Additionally, infertility in women may be caused by thyroid gland diseases such as hypothyroidism, hyperthyroidism, and other thyroid gland disorders 5 .When the thyroid gland does not generate enough hormones, it causes weariness, weight gain, and sadness 6 .Conversely, hyperthyroidism happens when the thyroid gland produces too many hormones, resulting in weight loss, anxiety, and tremors 7 .Both disorders can seriously affect a person's health and quality of life.
Certain medications or issues with the pituitary gland regulating the thyroid can also cause thyroid disease.The high annual mortality rate from thyroid disease highlights the significant impact of thyroid cancer on global health 8 .To facilitate clinical decision-making, developing decision models that account for the different causes of death that may compete with thyroid cancer is essential 9 .At the same time, technology can improve healthcare delivery and strengthen health infrastructure 10 .In addition, early detection and timely treatment of thyroid disease can reduce fatalities 11 .The diagnosis of thyroid disease entails a battery of examinations encompassing blood analyses, imaging procedures, and biopsies.The TSH blood test is a primary screening tool for diagnosing thyroid disorders.Medical imaging modalities, including ultrasound, computed tomography (CT), and magnetic resonance imaging (MRI), can produce highresolution visual representations of the thyroid gland and its adjacent anatomical structures.The medical procedure known as biopsy involves the extraction of a minute tissue sample for laboratory analysis.This procedure is conducted to verify a diagnosis of thyroid cancer 12 .However, healthcare providers may face challenges in diagnosing thyroid disease due to similar symptoms to other illnesses, variable symptoms, limited access to specialized care, and limited diagnostic tests 13 .
With the advancement of machine learning in healthcare, many experts consider Thyroid Disease Diagnosis Based on Machine Learning (TDDBML) a viable option.Machine learning improves the accuracy and efficiency of disease diagnosis, and algorithms can scan large amounts of data and recognize patterns that doctors may overlook.ML algorithms can evaluate electronic health data and patient monitoring devices to identify early indications of disease 14 ; algorithms can scan high amounts of data and recognize patterns that doctors may miss-improving patient care and reducing wait times.The prevalence of thyroid disease and its significant impact on public health have led to the exploration of using ML for its diagnosis 15 .
Several investigations have suggested using ML algorithms, including Support Vector Classifier (SVC), Artificial Neural Networks (ANN), Naive Bayes, Random Forest, and K-Nearest Neighbors, for diagnosing thyroid diseases using various datasets.For instance, Islam S et al, found that the ANN classifier achieved a 96% accuracy rate in predicting thyroid diseases 16 .In addition, Vairale et al, used classification machine learning technique SVM to predict the level of hypothyroid disease 17 while Shyamala Devi et al, used multiple ML techniques with a 99% accuracy rate for predicting hypothyroid disease 18 .Guleria et al, achieved 100% accuracy in the early prediction of hypothyroidism using ANN 19 .
One of the possible limitations of ML and deep learning (DL)-based solutions is that they frequently involve sophisticated algorithms that require a large amount of data to train This can make it hard for doctors to evaluate the algorithm's diagnosis and raises bias and reliability concerns 20 .For instance, DLs have numerous invisible layers, but it is not always easy to tell what role each plays in the model's predictions 21 .Another potential difficulty is that ML algorithms tend to support the majority class in their results.The term majority class refers to a dataset in which one category leads the others in a total value 22 .Therefore, it is essential for researchers and healthcare providers to carefully consider these issues when developing and using ML-based models to predict thyroid disease to ensure that they are unbiased.highlights the limited use of SLR compared to the focus on ML techniques.For instance, the K Lee et al. study presents an SLR with a drawback: the machine learning methods applied vary with the data used for thyroid disease diagnosis 23 .Nevertheless, since most datasets used for thyroid disease diagnosis are imbalanced, evaluating the performance of ML on such data is crucial.The study lacked a precise specification of the periods involved 24 .With the increasing popularity of MLbased diagnosis, applying SLR with meta-analysis is expected to address the gaps in existing studies.
The increasing number of studies in Thyroid Disease Diagnosis Based on Machine Learning (TDDBML) highlights the need for a systematic review of existing knowledge.An SLR was conducted using Scopus and WoS databases, resulting in the analysis of 168 papers and further examination of 41-the metadata analysis aimed to identify leading academic institutions, critical research areas, and high-quality sources.In addition, a comprehensive review of 41 publications was conducted to address the following inquiries: What are the existing DL and ML-based approaches for diagnosing thyroid disease?What are the current techniques for dealing with datasets with an imbalanced class ratio?
The SLR aims to supply a resource for researchers by summarizing the latest methods and developments in the field and identifying gaps in knowledge that may be addressed by creating a more advanced TDDBML model.The structure of the remaining article is a methodology of the systematic literature review, briefly described in Section 2. Section 3 presents the results and analysis, Section 4 summarizes the findings, and Section 5 contains the conclusion.

Materials and Methods
An SLR positions research questions before systematically searching for, selecting, and evaluating studies to see what information they may obtain 26 .This approach is chosen due to its reputation for providing a precise and reliable synthesis of scholarly content and is widely recognized across diverse research fields.The eligible study items for meta-analyses, and in this study, the PRISMA guidelines were followed for conducting and reporting the systematic review.The PRISMA checklist was used to ensure that all relevant information was included in the study, and the flow diagram was utilized to document the study selection process 27 .

Identification of the data
A thorough exploration was conducted using Scopus's integrated and WoS databases, which include all major publishers, including Emerald, Taylor & Springer, IEEE, and Willey.Many researchers consider the WoS and Scopus databases reliable for SLR due to the excellent quality of the indexing contents 28  "machine learning", "imbalance", and "deep learning" to find relevant publications.In addition, Boolean operators and various keywords are used to improve the search.

Screening initial data and determining eligibility
The Scopus and WoS databases were extensively searched for this systematic literature review using specific keywords and a query that included "thyroid disease" OR "thyroid" AND "machine learning" OR "artificial intelligence" OR "data mining" OR "deep learning" AND "diagnosis" OR "detection" OR "classification" OR "prediction."The initial Scopus search revealed 2,182 articles, whereas the WoS search yielded 486.After applying the year boundaries of 2013-November-2022 and further filtering based on document type, language, subject area, and keyword constraints, the number of papers was decreased to 168.Afterwards, the remaining 168 distinct papers were assessed, and the most pertinent information was extracted using a consistent extraction template.The study excluded papers that were not related to machine learning or were primarily concerned with thyroid disorders.
Furthermore, book chapters, ultrasound imaging, non-human studies, and reviews were not included.Finally, 41 full-text papers met the inclusion criteria in Fig 1 and were included in the review.A flowchart was created to show the study selection process, including the search query and inclusion criteria.Overall, the study selection approach was thorough and rigorous, ensuring that the analysis included the most relevant and latest studies on using machine learning techniques to identify thyroid disorders.
The inclusion and exclusion criteria were constructed using machine learning techniques to ensure that the review was comprehensive and relevant to thyroid disease diagnosis.Articles and conference papers about the TDDBML, original research findings or empirical data about the TDDBML, machine learning and profound learning studies about the TDDBML published between 2013 and 2022 met the inclusion requirements.Papers that have not yet been published in English, machine learning studies unrelated to TDDBML, published literature previous to 2013, non-article or conference paper research, duplicate studies, preliminary data studies, or studies with confusing conclusions were also excluded.These criteria were used to employ machine learning techniques to examine the most relevant and up-to-date papers on the diagnosis of thyroid disease while removing irrelevant studies that did not fulfil the conditions mentioned above.The objective of the literature review was to use machine learning techniques to explore the latest and most pertinent publications concerning the diagnosis of thyroid disease.

Observations and findings
The following section will discuss the findings and insights from evaluating the metadata.These results are based on a meta-study of 168 papers, including studying their corresponding metadata and content.

Metadata analysis
Metadata analysis helps understand scholarly literature by extracting information about the scholarly process authors, articles, journals, and other elements 29 .The metadata analysis was applied to 168 papers.The papers were classified based on various factors, including year of publication, publication type, publisher, country of origin, subject matter, funding source, and academic institution.

Published by year
As shown in Figure 2, 168 papers were reviewed to see how many dealt with thyroid disease prediction using ML algorithms over the past decade.Publishing is constantly expanding, and this increase is expected to intensify significantly in 2020 and 2022.For example, in 2022, around 60 new papers were published; in 2020, just 36 new papers were published.
In addition, it has been evident throughout the period that the significance of the classification problem in the diagnosis of thyroid disease has received much attention.As a result, the number of scholarly works distributed to the public in 2022 is substantially more than in any previous year.On the other hand, one can observe the minuscule number of papers published, particularly from 2013 to 2017, when there were only a handful of papers.Consequently, increasing focus and concern are directed toward diagnosing thyroid disease, including classification issues and other data-driven concerns.

Most Relevant Authors
According to Figure 3, Fu C and Liu W. have penned the most pertinent papers of the five and are the most impactful authors.Therefore, our team conducted a comparable data examination to monitor the author's production over time.Indeed, findings revealed that Fu C and Liu W jointly produced four 2021 articles that received 10.5 citations.

Most Relevant Sources
As shown in Figure 4, the most pertinent sources had ten documents: advances in intelligent systems and Computers, expert systems with Applications, a total of 6, and The Journal of the study material in Networks and Systems, a total of 5 similarly.

Most frequently words used in the titles and keywords Table
The Table 2.The R-software program is used to identify the most popular keywords.Our main goal was to find and evaluate articles on machine learning, deep learning, imbalance class, and thyroid disease.However, it was surprising to find that the most frequently used keywords in the articles were "thyroid" and "disease", as shown in Table 3.The writers used the terms "machine learning" 31 times in the keyword field, followed by the terms "thyroid disease" 24 times and "classification" 19 times.Of course, articles often use the exact phrases which are listed below.However, intriguing outcomes were discovered when the examination was limited to the keywords applied by the authors in the articles' keyword sections.A word cloud is a straightforward method for identifying the prevalent themes and key phrases in the referenced articles, allowing for identifying the most general terms in a complex environment.Figure 5 displays word clouds generated by software, where larger and bolder text represents the terms most frequently used, and smaller and less bold text highlights the less commonly used phrases.

Trending Topics
The trend topics were generated by introducing only papers published from 2013 to 2022.Graphical parameters, including the author's keywords field, were used, with a minimum word frequency of three and three words considered per year.From Fig 6, the main keywords used each year can be observed.The lines showed when each word was used, and the size of the bubbles indicates how frequently the term appeared.For example, the most frequently used term in 2021 was "machine learning".
Interestingly, the trend in research has evolved over the years.In 2021, the most frequent word was "deep learning," and then the research shifted towards exploring "thyroid" in 2022, followed by "thyroid disease" and "classification" Over the last several years, other words such as "feature selection," "thyroid cancer," and "artificial intelligence" have appeared.
The data reveals that machine learning and deep learning have become increasingly popular, with a specific interest in applying them as a healthcare model for disease diagnosis.Thyroid disease and thyroid cancer are the most researched thyroid-related topics, likely due to their high prevalence worldwide.Understanding and diagnosing thyroid disorders is crucial in the medical field, and these research trends highlight the importance of applying advanced technologies to improve patient care and treatment.Overall, the data provide exciting insights into the evolving research trends in machine learning, deep learning, and thyroid-related topics.The data also provides a snapshot of the current research output from academic affiliations worldwide.It highlights the ongoing efforts of universities to produce high-quality research that can enhance our understanding of various fields and contribute to developing new knowledge and innovations.
Understanding the institutions producing the most research in a particular field can be valuable for researchers to make informed decisions and arrange research efforts.

Insights of TDDBML
In this section, an in-depth examination of 41 research articles will be conducted, covering topics such as unbalanced data, thyroid disease, and machine learning.This review aims to provide insight into the concepts, methods, and potential future applications relevant to theorists and practitioners.

Thyroid disease kinds
As machine learning-based methods improved, scientists and doctors began using datadriven methods to determine if a patient had a thyroid issue from a blood sample.Patients, however, often have to wait until their symptoms have worsened before they see a doctor because of the difficulties involved in undertaking the numerous routine tests.However, ML-based methods enable early-stage diagnosis, which the subject himself can perform using inexpensive and compact sensors10 routinely.Thyroid disease can be classified into seven diagnostic categories: hypothyroidism, euthyroidism, goiter, thyroiditis, thyroid cancer, thyroid hormone resistance, and hyperthyroidism.There are two main types of hypothyroidism and hyperthyroidism.At least 15 of the papers out of the 41 chosen ones took into account two types because of their fatal consequences.Both types affect metabolism function, and severe conditions need medical attention 30 .
Indeed, thyroid disease, particularly in its terminal stage, is related to an increased risk of cardiovascular illness, elevated blood pressure, higher cholesterol levels, and mental depression 31 .Therefore, to effectively treat patients with thyroid disease, it is crucial to diagnose the condition early.Ahmed et al. achieved a 98.2% accuracy rate in differentiating between hypothyroid and hyperthyroid states by training a deep neural network, as reported in their study 6 .Pal et al. compared the three machine learning models for predicting thyroid disease, including KNN, DT, and Multi-Layer Perceptron (MLP), and found that it achieves the highest accuracy of 94.23% 32 .The UCI thyroid disease open repository dataset was used in the study.On the other hand, Aljameel used an EANN-based approach to distinguish between thyroid cancer and non-cancer raw data using realworld data with 99% accuracy 33 .
Figure 8 depicts the most commonly reported disease associated with the thyroid from cited studies.Thyroid disease has the largest cluster compared to other diseases related to machine learning techniques.In addition, several keywords have been repeated, which indicate that it is a technique used to predict a thyroid disease in the early stage, such as random forest, k-NN, and ANN.

Machine learning algorithms
Table 4, illustrates that support vector machine (SVM) algorithms have received more attention from researchers and practitioners than any other ML type in designing PTDBML models.At least 12 of the 41 studies that attempted to develop a model to diagnose thyroid disease used an SVMbased approach that used the standard technique in healthcare system prediction 34 .For instance, Płuciennik et al. has developed a model for thyroid cancer diagnostics, which achieved approximately 95% accuracy 35 .Vairale et al. compared SVM to Logistic Regression (LR), K-NN, and NN for identifying people with a hypothyroid disease on the actual case dataset.SVM showed the best performance among all algorithms, producing an accuracy of 99% 17 .
On the Other side, the RF classifier is the following algorithm to enhance the thyroid disease prediction model: nine studies were conducted to develop a model for thyroid disease prediction.As a result, Alghamdi has designed an efficient predictive model to find thyroid cancer in the Prostate, Lung, Colorectal, and Ovarian (PLCO) dataset, defined as 155000 examples 36 .They used seven models the Logistic Regression model (LR), KNN, Ada boost classifier (AdaB), SVM, DT, Gaussian Naïve Bayes (GNB), RF, and Gradient Boosting classifier (GB); the RF has vital accuracy of 100%.
There are evident that, as time has progressed, a growing number of TDDBML model development efforts have focused on DL algorithms rather than classic ML.However, only 8 out of 41 studies focused on using DL to create a model for TDDBML, indicating that more research is needed.In order to classify individuals into normal, hyperthyroid, and hypothyroid categories, Guleria et al. used a thyroid cancer prediction system based on MLP.According to preliminary computational results, the proposed model identifies thyroid issues with an accuracy of 99.8% 37 .M Asif et al. proposed that MLP was the most effective algorithm, achieving an accuracy of 99.70% 38 .Zhou et al used ten ML algorithms through thyroid surgery to demonstrate a corresponding model.They employed a CNN model that utilized AUC and accuracy measures to identify patients at an early stage of thyroid disease.The study's main finding, based on data from 500 actual patients, was that the model achieved a 90% accuracy in accurately identifying individuals with thyroid disease, along with an AUC of 83% 39 .Other ML-based algorithms researchers use to create the TDDBML model include KNN 40 , Hoeffding 41 , XGBoost 31 , and Adaboost and Bagging 42 .

Imbalance challenges
One initial focus was tracking down previous research publications on thyroid disease that included analyses of imbalanced data.However, it became evident as one read through the articles that the vast majority of research either adopted data from other open sources or their studies used actual data and that in both situations, the datasets were unbalanced.As a result, the quality assessment revealed that eight articles relied on experimental results from the unbalanced dataset.In addition, recent studies have addressed the issue of imbalanced data's effect on model performances, which most studies ignored.
The imbalance problems are dealt with in various ways depending on the author.For instance, Zhou et al. assessed the model performance on unbalanced data classification by computing its f1 score, ROC-AUC curves, and accuracy rate 39 .N. Alghamdi has worked on the PLCO dataset, which shows patients that more classes have not been diagnosed with thyroid cancer, and fewer classes are diagnosed with thyroid cancer, and they relied on an under-sampling technique to handle imbalanced classes 36 .Aljameel S S et al, worked on a dataset with an imbalance (much more thyroid cancer cases than non-thyroid cancer cases); thus, they applied the SMOTEENN technique to avoid biasing the models toward one of the outcomes 33 .While 16,43,44 rely on SMOTE to handle imbalanced data issues to prevent bias in the performance measures.In cases where SMOTE is used to match the data, the overall model accuracy increases.Finally, Hayashi et al. have suggested a model like continuous Re-RX extract informative principles from the thyroid dataset with the correct values of subdivision rate for both the majority and minority classes 45 .
Some researchers adopt DL-based solutions to replace all other algorithm-level methods.For instance, Selwal & Raoof used an MLP machine learning model to develop a more accurate system for predicting thyroid disease, which they tested on random samples of hyperthyroid, hypothyroid, and healthy subjects 37 .On the other hand, after choosing variables for thyroid illness prediction, several studies employ the Convolutional Neural Network (CNN), Long-Short-Term Memory (LSTM), and CNNLSTM.The authors demonstrate that their proposed model may achieve an AUC of 72%.Unfortunately, thyroid disease-related datasets are notoriously unbalanced, and few publications have investigated methods to address this issue outside of classification and fabricated models 18,23,46 .Many ways are available to handle imbalanced classes; however, few studies have been mentioned in this overview that impact model performance the most.

Results and Discussion
A thorough examination of 41 studies was done to understand the current practices and techniques for identifying thyroid disorders when working with an unbalanced dataset.The comprehensive analysis evaluated the following factors: thyroid disease type, applications, machine learning (ML) algorithms, and imbalance solutions.Based on the review of the literature, it was found that the authors employed a variety of datasets, including both real-case datasets and UCI datasets.Real-case datasets included data from medical institutions, such as blood test results and medical records.UCI datasets were obtained from the UC Irvine Machine Learning Repository, which includes various publicly available datasets.The authors also used dummy datasets created using various techniques, such as data augmentation and oversampling.The goal was to provide insights into the current practices and techniques used in thyroid disease diagnosis using an imbalanced dataset and to discuss the study's limitations and potential future research directions.
Overall, hypothyroidism and hyperthyroidism have received excellent attention in TDDBML.At the same time, other investigations looked into euthyroid 16 and thyroid surgery 39 .However, other types of thyroid diseases, such as thyroiditis, goiter, graves, and Hashimoto, a common problem among people suffering from malnutrition, trauma, surgery, or severe acute or chronic disease, received relatively less attention 47 .Most ML-based models are designed to detect thyroid disease patients and emphasize classification.Because of their availability and the issues connected with data imbalance concerns, most researchers heavily investigated popular datasets: UC Irvine Machine Learning Repository.However, a few research took into account real-world data 17,39,48 and large datasets 37,46 .A large amount of data helps the healthcare industry create more effective disease detection and decision-support systems 34 .The performance variation of the model has been detected in the study findings supplied as public source data and actual data.However, it cannot be denied that the performance of the models Baghdad Science Journal will be more accurate when the experiment is conducted using actual data 49 .In order to evaluate the effectiveness of ML-based models, it is required to use actual data rather than public repository data 50 .
The instability of the model is one of the primary factors contributing to the restricted capabilities of the Clinical Decision Support Systems (CDSS) system 51 .Since clinical systems cannot function correctly using only old patient data, the CDSS model must be continuously refined and updated, considering new information 52 .Situations where it is necessary to collect data in real-time and train an ML model, such as the operating room during an emergency or a blood test conducted with the new devices, are likely to provide significant difficulties 53 .
It is important to note that the majority of studies in the literature review utilized traditional machine learning methods such as SVM, RF, DT, KNN, and NB, 46,35,36 while fewer studies explored the use of deep learning algorithms such as ANN, MLP, CNN, LSTM, and BPNN 39,16,54,55 .While traditional machine learning algorithms have shown promising results in thyroid disease diagnosis, the potential of deep learning algorithms should not be overlooked, as they have shown success in various other medical applications.Among the traditional machine learning algorithms, the RF-based model has received the most attention from researchers 18, ,36, 48, 56, 57 .Due to its ability to handle thousands of variables and provide highly accurate classification 58 .However, it is essential to consider that different machine learning algorithms may perform differently depending on the specific characteristics of the dataset and the type of thyroid disease being diagnosed 59 .Therefore, it is crucial to explore a range of machine-learning algorithms and choose the most appropriate one for the problem.
The limited focus on handling imbalanced data in previous studies is a significant gap in the literature.Most studies have concentrated on feature selection techniques, neglecting the importance of handling imbalanced data, which can significantly impact the model's performance.This gap can be attributed to the lack of awareness among researchers about the impact of imbalanced data on the model's performance and the available techniques to handle it effectively.As a result, most of the models developed for thyroid disease diagnosis may not be suitable for real-world applications since they have not been tested with imbalanced data.Future researchers should focus on techniques for handling imbalanced data to address this gap.These techniques should be integral to model development and evaluation, and researchers should consider their impact on model performance.Techniques such as SMOTE 16,43,44 , SMOTEENN 33 , Under-sampling 36 , and BRACID 45 have been used in previous studies to handle imbalanced data with varying degrees of success.Future researchers should consider these techniques and explore other advanced methods to improve the performance of the models.
One effective technique for handling imbalanced data is Cost-Sensitive Learning 60 .Cost-Sensitive Learning is a technique that assigns different misclassification costs to different classes.This technique can help to balance the misclassification costs between the minority and majority classes, thus improving the model's performance.Data Augmentation is another technique that can be used to address the imbalance problem 61 .Data Augmentation involves generating synthetic samples from the minority class to balance the class distribution.This technique can help to improve the performance of the model by increasing the diversity of the dataset.Threshold Adjustments are also effective in handling imbalanced data 62 .Threshold Adjustments involve adjusting the decision threshold of the model to improve the performance of the minority class.
Machine learning models often perform better intra-patient than inter-patient (inter-patient).Different data or patient characteristics can cause this 63 .For example, if a machine learning model is trained and evaluated on a dataset of individuals with one type of thyroid disease, it may not perform as well on a different dataset of individuals with a different type of thyroid disease.This could be due to patient-specific data, such as symptoms or blood test findings 64 .A larger diverse dataset of patients may be needed to train the machine learning model to generalize to a broader range of patient populations to increase its performance on interpatient data.Choosing a different machine-learning method may be essential 65 .
Most traditional classification methods try to find an ideal classifier that maximizes classification accuracy while keeping the misclassification cost constant, which can be problematic when dealing with imbalanced classes 66 , considering the potential that the cost of misclassification may vary based on the probability distribution of the sample.In addition, most of the reported research included computationally costly techniques, including noising, thyroid segmentation, feature extractions, and classifications 67 .Deploying such a model in a real-world scenario could be challenging and an exciting avenue for future research.This aspect becomes even more critical when misclassification can lead to severe outcomes, as in the case of medical diagnostics.
In addition, there are challenges related to machine learning models' accuracy and reliability.Machine learning models are only as good as the data they are trained on, and if the data is of poor quality or biased, the model's predictions may not be accurate 68 .Therefore, it is essential to carefully evaluate the performance of machine learning models using appropriate evaluation metrics and consider the models' limitations when making predictions.Clinical diagnosis systems based on machine learning raise security problems for making diagnoses 69 .This study's findings significantly impact the development and evaluation of machine-learning models to diagnose and treat thyroid illness.The study emphasizes the need for additional research and development of machine learning models that can effectively handle imbalanced datasets, the importance of using real-world datasets, the potential of deep learning algorithms, and the challenges associated with deploying machine learning-based CDSS models.By addressing these consequences, researchers and doctors can use machine learning techniques to improve thyroid illness diagnosis and therapy accuracy and effectiveness.( Table 4) collects the results of the studies cited in the literature to provide light on the ML-based prediction of thyroid disease.

Conclusion
This study uses imbalanced data to discover the most recent ML-based and data-driven developments and strategies in diagnosing thyroid disease.When developing ML-based systems for predicting thyroid disease in the real world, including real-patient data and using interpretable machinelearning methods to explain the final predictions is essential accurately.A comprehensive review of 41 papers suggests that more research is needed to prove reliable performance in healthcare settings.Although Deep Learning has come to dominate the area, SMOTE is still widely used as an Over-Sampling technique for handling unbalanced data by many academics and practitioners.Many researchers have noticed the development of an RF-based model for predicting thyroid disease since it is easier to train and can handle many features.Another big attraction is that they resist overfitting, making them useful in various machine-learning applications.The limits of ML that are discussed in the discussion sections may guide the direction of future research.Regardless, ML-based thyroid disease detection utilizing imbalanced data and innovative techniques is expected to uncover numerous undiscovered opportunities in the future.-Authors sign on ethical consideration's approval.

Authors' Declaration
-Ethical Clearance: The project was approved by the local ethical committee in Universiti Teknologi Malaysia, Johor, Malaysia.

Figure 1 .
Figure 1.PRISMA approach applied in this research.

Figure 6 .
Figure 6.Trending topics extracted from the topic of thyroid disease prediction

Figure 7 .
Figure 7. Top ten institutions based on the number of publications

-
Conflicts of Interest: None.-We hereby confirm that all the Figures and Tables in the manuscript are ours.Furthermore, any Figures and images, that are not ours, have been included with the necessary permission for republication, which is attached to the manuscript.