This is a preview and has not been published.

Exploring the Challenges of Diagnosing Thyroid Disease with Imbalanced Data and Machine Learning: A Systematic Literature Review




Classification, Deep learning, Imbalanced data, Machine learning, Thyroid disease


Thyroid disease is a common disease affecting millions worldwide. Early diagnosis and treatment of thyroid disease can help prevent more serious complications and improve long-term health outcomes. However, thyroid disease diagnosis can be challenging due to its variable symptoms and limited diagnostic tests. By processing enormous amounts of data and seeing trends that may not be immediately evident to human doctors, Machine Learning (ML) algorithms may be capable of increasing the accuracy with which thyroid disease is diagnosed. This study seeks to discover the most recent ML-based and data-driven developments and strategies for diagnosing thyroid disease while considering the challenges associated with imbalanced data in thyroid disease predictions. A systematic literature review (SLR) strategy is used in this study to give a comprehensive overview of the existing literature on forecasting data on thyroid disease diagnosed using ML. This study includes 168 articles published between 2013 and 2022, gathered from high-quality journals and applied meta-analysis. The thyroid disease diagnoses (TDD) category, techniques, applications, and solutions were among the many elements considered and researched when reviewing the 41 articles of cited literature used in this research. According to our SLR, the current technique's actual application and efficacy are constrained by several outstanding issues associated with imbalance. In TDD, the technique of ML increases data-driven decision-making. In the Meta-analysis, 168 documents have been processed, and 41 documents on TDD are included for observation analysis. The limits of ML that are discussed in the discussion sections may guide the direction of future research. Regardless, this study predicts that ML-based thyroid disease detection with imbalanced data and other novel approaches may reveal numerous unrealised possibilities in the future


Download data is not yet available.


Aschebrook-Kilfoy B, Ward MH, Sabra MM, Devesa SS. Thyroid Cancer Incidence Patterns in the United States by Histologic Type, 1992–2006. Thyroid. 2011 Feb; 21(2): 125–134.

Harikrishna A, Ishak A, Ellinides A, Saad R, Christodoulou H, Spartalis E, et al. The impact of obesity and insulin resistance on thyroid cancer: A systematic review. Maturitas. 2019 Jul; 125: 45–49.

Lim H, Devesa SS, Sosa JA, Check D, Kitahara CM. Trends in thyroid cancer incidence and mortality in the United States, 1974-2013. J Am Med Assoc. 2017; 317(13): 1338–1348.

Clinic M. Hashimoto’s disease. 2022;

Thabit MA, Abdullah GH, AL-Rawi KF. Polymorphism study of MTHFR 677C?T and its correlation with oxidative stress and their influence on female infertility in Erbil – Iraq. Baghdad Sci J. 2017 Sep 3; 14(3): 611–618.

Ahmed I, Mohiuddin R, Muqeet MA, Kumar JA, Thaniserikaran A. Thyroid Cancer Detection using Deep Neural Network. Proc Int Conf Appl Artif Intell Comput 2022: 166–169.

Bahn RS, Burch HB, Cooper DS, Garber JR, Greenlee MC, Klein I, et al. Hyperthyroidism and Other Causes of Thyrotoxicosis: Management Guidelines of the American Thyroid Association and American Association of Clinical Endocrinologists. Thyroid. 2011 May; 17(3): 456–520.

Porter KM, Ward M, Hughes CF, O’Kane M, Hoey L, McCann A, et al. Hyperglycemia and Metformin Use Are Associated With B Vitamin Deficiency and Cognitive Dysfunction in Older Adults. J Clin Endocrinol Metab. 2019 Oct 1; 104(10): 4837–4847.

Papaleontiou M, Norton EC, Reyes-Gastelum D, Banerjee M, Haymart MR. Competing Causes of Death in Older Adults with Thyroid Cancer. Thyroid. 2021 Sep 1; 31(9): 1359–1365.

Hamamurad QH, Jusoh NM, Ujang U. Modern City Issues, Management and the Critical Role of Information and Communication Technology. Int J Adv Comput Sci Appl. 2022; 13(4): 368–373.

Leung AKC, Leung AAC. Evaluation and management of the child with hypothyroidism. World J Pediatr. 2019 Apr 8; 15(2): 124–134.

Alexander EK, Pearce EN, Brent GA, Brown RS, Chen H, Dosiou C, et al. 2017 Guidelines of the American Thyroid Association for the Diagnosis and Management of Thyroid Disease During Pregnancy and the Postpartum. Thyroid. 2017 Mar; 27(3): 315–389.

Obschonka M, Audretsch DB. Artificial intelligence and big data in entrepreneurship: a new era has begun. Small Bus Econ. 2020; 55(3): 529–539.

Garbuio M, Lin N. Artificial Intelligence as a Growth Engine for Health Care Startups: Emerging Business Models. Calif Manage Rev. 2019; 61(2): 59–83.

Alfifi M, Shady M, Bataineh S, Mezher M. Enhanced Artificial Intelligence System for Diagnosing and Predicting Breast Cancer using Deep Learning. Int J Adv Comput Sci Appl. 2020; 11(7): 498–513.

Islam SS, Haque MS, Miah MSU, Sarwar T Bin, Nugraha R. Application of machine learning algorithms to predict the thyroid disease risk: an experimental comparative study. PeerJ Comput Sci. 2022 Mar 3; 8: e898.

Vairale VS, Shukla S. Classification of Hypothyroid Disorder using Optimized SVM Method. Proc 2nd Int Conf Smart Syst Inven Technol. 2019: 258–263.

Devi MS, Shil A, Katyayan P, Surana T. Constituent Depletion and Divination of Hypothyroid Prevalance using Machine Learning Classification. Int J Innov Technol Explor Eng. 2019 Oct 30; 8(12): 1607–1612.

Guleria K, Sharma S, Kumar S, Tiwari S. Early prediction of hypothyroidism and multiclass classification using predictive machine learning and deep learning. Meas Sensors. 2022 Dec; 24(5): 100482.

Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf. 2019; 28(3): 231–237.

Ahsan MM, Ahad MT, Soma FA, Paul S, Chowdhury A, Luna SA, et al. Detecting SARS-CoV-2 from chest X-Ray using artificial intelligence. Ieee Access. 2021; 9: 35501–35513.

Feng W, Huang W, Ren J. Class Imbalance Ensemble Learning Based on the Margin Theory. Appl Sci. 2018 May 18; 8(5):815.

Lee KS, Park H. Machine learning on thyroid disease: a review. Front Biosci - Landmark. 2022; 27(3): 101.

Mendoza AM, Hernandez RM. Application of Data Mining Techniques in Diagnosing Various Thyroid Ailments: A Review. In: 13 th Int Conf Inf Commun Technol Syst. 2021: 207–212.

Anari S, Tataei Sarshar N, Mahjoori N, Dorosti S, Rezaie A. Review of Deep Learning Approaches for Thyroid Cancer Diagnosis. Math Probl Eng. 2022: 1–8.

Okoli C, Schabram K. A Guide to Conducting a Systematic Literature Review of Information Systems Research. Soc Sci Res Netw. 2010; 10(26): 1–51.

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ. 2021; 372(2021): n71.

Fahimnia B, Sarkis J, Davarzani H. Green supply chain management: A review and bibliometric analysis. Int J Prod Econ. 2015 Apr; 162(2015): 101–114.

Gregg W, Erdmann C, Paglione L, Schneider J, Dean C. A literature review of scholarly communications metadata. Res Ideas Outcomes. 2019 Aug; 5: e38698.

Abeer Uthman Moosa FAAMAM. Effect of Hyper and Hypothyroidism on Lipid Profile and Liver Function of Male Rats. Baghdad Sci J. 2011; 8(4): 926–933.

Sankar S, Potti A, Naga Chandrika G, Ramasubbareddy S. Thyroid Disease Prediction Using XGBoost Algorithms. J Mob Multimed. 2022; 18(3): 917–934.

Pal M, Parija S, Panda G. Enhanced Prediction of Thyroid Disease Using Machine Learning Method. Proc IEEE VLSI DCS 2022 3rd IEEE Conf VLSI Device, Circuit Syst. 2022(February): 199–204.

Aljameel SS. A Proactive Explainable Artificial Neural Network Model for the Early Diagnosis of Thyroid Cancer. Comput. 2022; 10(10): 183.

Kamra V, Kumar P, Mohammadian M. Diagnosis support system for general diseases by implementing a novel machine learning based classifier. Int J Comput Digit Syst. 2021; 10(1): 737–746.

Płuciennik A, Płaczek A, Wilk A, Student S, Oczko-Wojciechowska M, Fujarewicz K. Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics. Int J Mol Sci. 2022; 23(19): 11880.

Alghamdi NS. Evaluation of classification models for predicting mortality rate using thyroid cancer data. J Comput Sci. 2019; 15(1): 131–142.

Selwal A, Raoof I. A Multi-layer perceptron based intelligent thyroid disease prediction system. Indones J Electr Eng Comput Sci. 2020; 17(1): 524–532.

Asif MAAR, Nishat MM, Faisal F, Shikder MF, Udoy MH, Dip RR, et al. Computer aided diagnosis of thyroid disease using machine learning algorithms. Proc 2020 11th Int Conf Electr Comput Eng. 2020; 4: 222–225.

Zhou CM, Wang Y, Xue Q, Yang JJ, Zhu Y. Predicting difficult airway intubation in thyroid surgery using multiple machine learning and deep learning algorithms. Front Public Heal. 2022; 10(2022): 1–14.

Jha R, Bhattacharjee V, Mustafi A. Increasing the Prediction Accuracy for Thyroid Disease: A Step Towards Better Health for Society. Wirel Pers Commun. 2022; 122(2): 1921–1938.

Yadav DC, Pal S. Decision tree ensemble techniques to predict thyroid disease. Int J Recent Technol Eng. 2019; 8(3): 8242–8246.

Priyadharsini D, Sasikala S. Efficient Thyroid Disease Prediction using Features Selection and Meta-Classifiers. Proc 6th Int Conf Comput Methodol Commun. 2022 :1236–1243.

Francisco IR, Ferolin MBJ, Pena CF, Ferolin RJ. Thy-Sys: A Preliminary Thyroid Wellness Assessment Through Machine Learning Using Pathological Factors. Proc 1st Int Conf Inf Comput Res iCORE 2021: 44–49.

Danjuma KJ, Maksha Wajiga G, Garba EJ, Sandra Ahmadu A, Longe OB. Accuracy Assessment of Machine Learning Algorithm(s) in Thyroid Dysfunction Diagnosis. Proc 2022 IEEE Niger 4th Int Conf Disruptive Technol Sustain Dev, NIGERCON. 2022: 1–5.

Hayashi Y, Nakano S, Fujisawa S. Use of the recursive-rule extraction algorithm with continuous attributes to improve diagnostic accuracy in thyroid disease. Inform Med Unlocked. 2015; 1(2015): 1–8.

Arjaria SK, Rathore AS, Chaubey G. Developing an Explainable Machine Learning-Based Thyroid Disease Prediction Model. Int J Bus Anal. 2022; 9(3): 1–18.

Vanderpump MPJ. The epidemiology of thyroid disease. Br Med Bull. 2011; 99(1): 39–51.

Olatunji SO, Alotaibi S, Almutairi E, Alrabae Z, Almajid Y, Altabee R, et al. Early diagnosis of thyroid cancer diseases using computational intelligence techniques: A case study of a Saudi Arabian dataset. Comput Biol Med. 2021; 131(April): 104267.

Liu F, Panagiotakos D. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol. 2022 Nov 5; 22(1): 287.

Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput Sci. 2021 May 22; 2(3): 160.

Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: A systematic review of trials to identify features critical to success. Br Med J. 2005; 330(7494):765–768.

Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017; 12(4): e0174944.

Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019 Sep 19; 25(9): 1337–1340.

Juneja K. Expanded and Filtered Features Based ELM Model for Thyroid Disease Classification. Wirel Pers Commun. 2022 Sep 1; 126(2): 1805–1842.

Pavya K, Srinivasan B. Feature selection algorithms to improve thyroid disease diagnosis. IEEE Int Conf Innov Green Energy Healthc Technol. 2017: 1–5.

Akhtar T, Gilani SO, Mushtaq Z, Arif S, Jamil M, Ayaz Y, et al. Effective voting ensemble of homogenous ensembling with multiple attribute-selection approaches for improved identification of thyroid disorder. Electron. 2021; 10(23): 3026.

Duggal P, Shukla S. Prediction of thyroid disorders using advanced machine learning techniques. Proc Conflu 2020 - 10th Int Conf Cloud Comput Data Sci Eng. 2020; 670–675.

Deng W, Huang Z, Zhang J, Xu J. A Data Mining Based System for Transaction Fraud Detection. 2021 IEEE Int Conf Consum Electron Comput Eng. 2021: 542–545.

Fenza G, Gallo M, Loia V, Orciuoli F, Herrera-Viedma E. Data set quality in Machine Learning: Consistency measure based on Group Decision Making. Appl Soft Comput. 2021 Jul; 106(9): 107366.

Sun Y, Wong AKC, Kamel MS. Classification of imbalanced data: A review. Int J Pattern Recognit Artif Intell. 2009; 23(4): 687–719.

Shamsolmoali P, Zareapoor M, Shen L, Sadka AH, Yang J. Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing. 2021; 459: 481–493.

Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model. 2021; 61(6): 2623–2640.

Kann BH, Hosny A, Aerts HJWL. Artificial intelligence for clinical oncology. Cancer Cell. 2021; 39(7): 916–927.

Paleyes A, Urma RG, Lawrence ND. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Comput Surv. 2022; 55(6):1–29.

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019; 380(14): 1347–1358.

Gan D, Shen J, An B, Xu M, Liu N. Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Comput Ind Eng. 2020 Feb; 140(2): 106266.

Riajuliislam M, Rahim KZ, Mahmud A. Prediction of Thyroid Disease(Hypothyroid) in Early Stage Using Feature Selection and Classification Techniques. Int Conf Inf Commun Technol Sustain Dev. IEEE. 2021. p. 60–64.

Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. npj Digit Med. 2018; 1(1): 1–10.

Liang H, Tsui BY, Ni H, Valentim CCS, Baxter SL, Liu G, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. 2019 Mar 11; 25(3): 433–438.

Rasheeduddin S, Rajasekhar Rao K. Constructing a system for analysis of machine learning techniques for early detection of thyroid. Int J Eng Adv Technol. 2019; 8: 1978–1981.

Peya ZJ, Chumki MKN, Zaman KM. Predictive Analysis for Thyroid Diseases Diagnosis Using Machine Learning. 2021 Int Conf Sci Contemp Technol. 2021: 4–9.

Rao AR, Renuka BS. A Machine Learning Approach to Predict Thyroid Disease at Early Stages of Diagnosis. 2020 IEEE Int Conf Innov Technol. 2020; 1–4.

Shahid AH, Singh MP, Raj RK, Suman R, Jawaid D, Alam M. A Study on Label TSH, T3, T4U, TT4, FTI in Hyperthyroidism and Hypothyroidism using Machine Learning Techniques. 2019 Int Conf Commun Electron Syst. 2019;(Icces): 930–933.