Exploring the Challenges of Diagnosing Thyroid Disease with Imbalanced Data and Machine Learning: A Systematic Literature Review

Main Article Content

Dhekre Saber Saleh
https://orcid.org/0000-0002-4915-441X
Mohd Shahizan Othman
https://orcid.org/0000-0003-4261-1873

Abstract

Thyroid disease is a common disease affecting millions worldwide. Early diagnosis and treatment of thyroid disease can help prevent more serious complications and improve long-term health outcomes. However, thyroid disease diagnosis can be challenging due to its variable symptoms and limited diagnostic tests. By processing enormous amounts of data and seeing trends that may not be immediately evident to human doctors, Machine Learning (ML) algorithms may be capable of increasing the accuracy with which thyroid disease is diagnosed. This study seeks to discover the most recent ML-based and data-driven developments and strategies for diagnosing thyroid disease while considering the challenges associated with imbalanced data in thyroid disease predictions. A systematic literature review (SLR) strategy is used in this study to give a comprehensive overview of the existing literature on forecasting data on thyroid disease diagnosed using ML. This study includes 168 articles published between 2013 and 2022, gathered from high-quality journals and applied meta-analysis. The thyroid disease diagnoses (TDD) category, techniques, applications, and solutions were among the many elements considered and researched when reviewing the 41 articles of cited literature used in this research. According to our SLR, the current technique's actual application and efficacy are constrained by several outstanding issues associated with imbalance. In TDD, the technique of ML increases data-driven decision-making. In the Meta-analysis, 168 documents have been processed, and 41 documents on TDD are included for observation analysis. The limits of ML that are discussed in the discussion sections may guide the direction of future research. Regardless, this study predicts that ML-based thyroid disease detection with imbalanced data and other novel approaches may reveal numerous unrealised possibilities in the future

Article Details

How to Cite
1.
Exploring the Challenges of Diagnosing Thyroid Disease with Imbalanced Data and Machine Learning: A Systematic Literature Review. Baghdad Sci.J [Internet]. 2024 Mar. 4 [cited 2025 Jan. 19];21(3):1119. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/8544
Section
article

How to Cite

1.
Exploring the Challenges of Diagnosing Thyroid Disease with Imbalanced Data and Machine Learning: A Systematic Literature Review. Baghdad Sci.J [Internet]. 2024 Mar. 4 [cited 2025 Jan. 19];21(3):1119. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/8544

References

Aschebrook-Kilfoy B, Ward MH, Sabra MM, Devesa SS. Thyroid Cancer Incidence Patterns in the United States by Histologic Type, 1992–2006. Thyroid. 2011 Feb; 21(2): 125–134. https://dx.doi.org/10.1089/thy.2010.0021

Harikrishna A, Ishak A, Ellinides A, Saad R, Christodoulou H, Spartalis E, et al. The impact of obesity and insulin resistance on thyroid cancer: A systematic review. Maturitas. 2019 Jul; 125: 45–49. https://dx.doi.org/10.1016/j.maturitas.2019.03.022

Lim H, Devesa SS, Sosa JA, Check D, Kitahara CM. Trends in thyroid cancer incidence and mortality in the United States, 1974-2013. J Am Med Assoc. 2017; 317(13): 1338–1348. https://dx.doi.org/10.1001/jama.2017.2719

Clinic M. Hashimoto’s disease. 2022; https://www.mayoclinic.org/diseases-conditions/hashimotos-disease/symptoms-causes/syc-20351855?_ga=2.22575032.287617514.1676515456-1166957844.1676515456

Thabit MA, Abdullah GH, AL-Rawi KF. Polymorphism study of MTHFR 677C?T and its correlation with oxidative stress and their influence on female infertility in Erbil – Iraq. Baghdad Sci J. 2017 Sep 3; 14(3): 611–618. https://dx.doi.org/10.21123/bsj.2017.14.3.0611

Ahmed I, Mohiuddin R, Muqeet MA, Kumar JA, Thaniserikaran A. Thyroid Cancer Detection using Deep Neural Network. Proc Int Conf Appl Artif Intell Comput 2022: 166–169. https://dx.doi.org/10.1109/ICAAIC53929.2022.9792854

Bahn RS, Burch HB, Cooper DS, Garber JR, Greenlee MC, Klein I, et al. Hyperthyroidism and Other Causes of Thyrotoxicosis: Management Guidelines of the American Thyroid Association and American Association of Clinical Endocrinologists. Thyroid. 2011 May; 17(3): 456–520. https://dx.doi.org/10.4158/EP.17.3.456

Porter KM, Ward M, Hughes CF, O’Kane M, Hoey L, McCann A, et al. Hyperglycemia and Metformin Use Are Associated With B Vitamin Deficiency and Cognitive Dysfunction in Older Adults. J Clin Endocrinol Metab. 2019 Oct 1; 104(10): 4837–4847. https://dx.doi.org/10.1210/jc.2018-01791

Papaleontiou M, Norton EC, Reyes-Gastelum D, Banerjee M, Haymart MR. Competing Causes of Death in Older Adults with Thyroid Cancer. Thyroid. 2021 Sep 1; 31(9): 1359–1365. https://dx.doi.org/10.1089/thy.2020.0929

Hamamurad QH, Jusoh NM, Ujang U. Modern City Issues, Management and the Critical Role of Information and Communication Technology. Int J Adv Comput Sci Appl. 2022; 13(4): 368–373. https://dx.doi.org/10.14569/IJACSA.2022.0130443

Leung AKC, Leung AAC. Evaluation and management of the child with hypothyroidism. World J Pediatr. 2019 Apr 8; 15(2): 124–134. https://dx.doi.org/10.1007/s12519-019-00230-w

Alexander EK, Pearce EN, Brent GA, Brown RS, Chen H, Dosiou C, et al. 2017 Guidelines of the American Thyroid Association for the Diagnosis and Management of Thyroid Disease During Pregnancy and the Postpartum. Thyroid. 2017 Mar; 27(3): 315–389. https://dx.doi.org/10.1089/thy.2016.0457

Obschonka M, Audretsch DB. Artificial intelligence and big data in entrepreneurship: a new era has begun. Small Bus Econ. 2020; 55(3): 529–539. https://dx.doi.org/10.1007/s11187-019-00202-4

Garbuio M, Lin N. Artificial Intelligence as a Growth Engine for Health Care Startups: Emerging Business Models. Calif Manage Rev. 2019; 61(2): 59–83. https://dx.doi.org/10.1177/0008125618811931

Alfifi M, Shady M, Bataineh S, Mezher M. Enhanced Artificial Intelligence System for Diagnosing and Predicting Breast Cancer using Deep Learning. Int J Adv Comput Sci Appl. 2020; 11(7): 498–513. https://dx.doi.org/10.14569/IJACSA.2020.0110763

Islam SS, Haque MS, Miah MSU, Sarwar T Bin, Nugraha R. Application of machine learning algorithms to predict the thyroid disease risk: an experimental comparative study. PeerJ Comput Sci. 2022 Mar 3; 8: e898. https://dx.doi.org/10.7717/peerj-cs.898

Vairale VS, Shukla S. Classification of Hypothyroid Disorder using Optimized SVM Method. Proc 2nd Int Conf Smart Syst Inven Technol. 2019: 258–263. https://dx.doi.org/10.1109/ICSSIT46314.2019.8987767

Devi MS, Shil A, Katyayan P, Surana T. Constituent Depletion and Divination of Hypothyroid Prevalance using Machine Learning Classification. Int J Innov Technol Explor Eng. 2019 Oct 30; 8(12): 1607–1612. https://dx.doi.org/10.35940/ijitee.L3150.1081219

Guleria K, Sharma S, Kumar S, Tiwari S. Early prediction of hypothyroidism and multiclass classification using predictive machine learning and deep learning. Meas Sensors. 2022 Dec; 24(5): 100482. https://dx.doi.org/10.1016/j.measen.2022.100482

Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf. 2019; 28(3): 231–237. https://dx.doi.org/10.1136/bmjqs-2018-008370

Ahsan MM, Ahad MT, Soma FA, Paul S, Chowdhury A, Luna SA, et al. Detecting SARS-CoV-2 from chest X-Ray using artificial intelligence. Ieee Access. 2021; 9: 35501–35513. https://dx.doi.org/10.1109/ACCESS.2021.3061621

Feng W, Huang W, Ren J. Class Imbalance Ensemble Learning Based on the Margin Theory. Appl Sci. 2018 May 18; 8(5):815. https://dx.doi.org/10.3390/app8050815

Lee KS, Park H. Machine learning on thyroid disease: a review. Front Biosci - Landmark. 2022; 27(3): 101. https://dx.doi.org/10.31083/j.fbl2703101

Mendoza AM, Hernandez RM. Application of Data Mining Techniques in Diagnosing Various Thyroid Ailments: A Review. In: 13 th Int Conf Inf Commun Technol Syst. 2021: 207–212. https://dx.doi.org/10.1109/ICTS52701.2021.9608400

Anari S, Tataei Sarshar N, Mahjoori N, Dorosti S, Rezaie A. Review of Deep Learning Approaches for Thyroid Cancer Diagnosis. Math Probl Eng. 2022: 1–8. https://dx.doi.org/10.1155/2022/5052435

Okoli C, Schabram K. A Guide to Conducting a Systematic Literature Review of Information Systems Research. Soc Sci Res Netw. 2010; 10(26): 1–51. https://dx.doi.org/10.2139/ssrn.1954824

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ. 2021; 372(2021): n71. https://dx.doi.org/10.1136/bmj.n71

Fahimnia B, Sarkis J, Davarzani H. Green supply chain management: A review and bibliometric analysis. Int J Prod Econ. 2015 Apr; 162(2015): 101–114. https://dx.doi.org/10.1016/j.ijpe.2015.01.003

Gregg W, Erdmann C, Paglione L, Schneider J, Dean C. A literature review of scholarly communications metadata. Res Ideas Outcomes. 2019 Aug; 5: e38698. https://dx.doi.org/10.3897/rio.5.e38698

Abeer Uthman Moosa FAAMAM. Effect of Hyper and Hypothyroidism on Lipid Profile and Liver Function of Male Rats. Baghdad Sci J. 2011; 8(4): 926–933. https://dx.doi.org/10.21123/bsj.8.4.926-933

Sankar S, Potti A, Naga Chandrika G, Ramasubbareddy S. Thyroid Disease Prediction Using XGBoost Algorithms. J Mob Multimed. 2022; 18(3): 917–934. https://dx.doi.org/10.13052/jmm1550-4646.18322

Pal M, Parija S, Panda G. Enhanced Prediction of Thyroid Disease Using Machine Learning Method. Proc IEEE VLSI DCS 2022 3rd IEEE Conf VLSI Device, Circuit Syst. 2022(February): 199–204. https://dx.doi.org/10.1109/VLSIDCS53788.2022.9811472

Aljameel SS. A Proactive Explainable Artificial Neural Network Model for the Early Diagnosis of Thyroid Cancer. Comput. 2022; 10(10): 183. https://dx.doi.org/10.3390/computation10100183

Kamra V, Kumar P, Mohammadian M. Diagnosis support system for general diseases by implementing a novel machine learning based classifier. Int J Comput Digit Syst. 2021; 10(1): 737–746. https://dx.doi.org/10.12785/ijcds/100168

Płuciennik A, Płaczek A, Wilk A, Student S, Oczko-Wojciechowska M, Fujarewicz K. Data Integration–Possibilities of Molecular and Clinical Data Fusion on the Example of Thyroid Cancer Diagnostics. Int J Mol Sci. 2022; 23(19): 11880. https://dx.doi.org/10.3390/ijms231911880

Alghamdi NS. Evaluation of classification models for predicting mortality rate using thyroid cancer data. J Comput Sci. 2019; 15(1): 131–142. https://dx.doi.org/10.3844/jcssp.2019.131.142

Selwal A, Raoof I. A Multi-layer perceptron based intelligent thyroid disease prediction system. Indones J Electr Eng Comput Sci. 2020; 17(1): 524–532. https://dx.doi.org/10.11591/ijeecs.v17.i1.pp524-532

Asif MAAR, Nishat MM, Faisal F, Shikder MF, Udoy MH, Dip RR, et al. Computer aided diagnosis of thyroid disease using machine learning algorithms. Proc 2020 11th Int Conf Electr Comput Eng. 2020; 4: 222–225. https://dx.doi.org/10.1109/ICECE51571.2020.9393054

Zhou CM, Wang Y, Xue Q, Yang JJ, Zhu Y. Predicting difficult airway intubation in thyroid surgery using multiple machine learning and deep learning algorithms. Front Public Heal. 2022; 10(2022): 1–14. https://dx.doi.org/10.3389/fpubh.2022.937471

Jha R, Bhattacharjee V, Mustafi A. Increasing the Prediction Accuracy for Thyroid Disease: A Step Towards Better Health for Society. Wirel Pers Commun. 2022; 122(2): 1921–1938. https://dx.doi.org/10.1007/s11277-021-08974-3

Yadav DC, Pal S. Decision tree ensemble techniques to predict thyroid disease. Int J Recent Technol Eng. 2019; 8(3): 8242–8246. https://dx.doi.org/10.35940/ijrte.C6727.098319

Priyadharsini D, Sasikala S. Efficient Thyroid Disease Prediction using Features Selection and Meta-Classifiers. Proc 6th Int Conf Comput Methodol Commun. 2022 :1236–1243. https://dx.doi.org/10.1109/ICCMC53470.2022.9753986

Francisco IR, Ferolin MBJ, Pena CF, Ferolin RJ. Thy-Sys: A Preliminary Thyroid Wellness Assessment Through Machine Learning Using Pathological Factors. Proc 1st Int Conf Inf Comput Res iCORE 2021: 44–49. https://dx.doi.org/10.1109/iCORE54267.2021.00027

Danjuma KJ, Maksha Wajiga G, Garba EJ, Sandra Ahmadu A, Longe OB. Accuracy Assessment of Machine Learning Algorithm(s) in Thyroid Dysfunction Diagnosis. Proc 2022 IEEE Niger 4th Int Conf Disruptive Technol Sustain Dev, NIGERCON. 2022: 1–5. https://dx.doi.org/10.1109/NIGERCON54645.2022.9803113

Hayashi Y, Nakano S, Fujisawa S. Use of the recursive-rule extraction algorithm with continuous attributes to improve diagnostic accuracy in thyroid disease. Inform Med Unlocked. 2015; 1(2015): 1–8. https://dx.doi.org/10.1016/j.imu.2015.12.003

Arjaria SK, Rathore AS, Chaubey G. Developing an Explainable Machine Learning-Based Thyroid Disease Prediction Model. Int J Bus Anal. 2022; 9(3): 1–18. https://dx.doi.org/10.4018/IJBAN.292058

Vanderpump MPJ. The epidemiology of thyroid disease. Br Med Bull. 2011; 99(1): 39–51. https://dx.doi.org/10.1093/bmb/ldr030

Olatunji SO, Alotaibi S, Almutairi E, Alrabae Z, Almajid Y, Altabee R, et al. Early diagnosis of thyroid cancer diseases using computational intelligence techniques: A case study of a Saudi Arabian dataset. Comput Biol Med. 2021; 131(April): 104267. https://dx.doi.org/10.1016/j.compbiomed.2021.104267

Liu F, Panagiotakos D. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol. 2022 Nov 5; 22(1): 287. https://dx.doi.org/10.1186/s12874-022-01768-6

Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput Sci. 2021 May 22; 2(3): 160. https://dx.doi.org/10.1007/s42979-021-00592-x

Kawamoto K, Houlihan CA, Balas EA, Lobach DF. Improving clinical practice using clinical decision support systems: A systematic review of trials to identify features critical to success. Br Med J. 2005; 330(7494):765–768. https://dx.doi.org/10.1136/bmj.38398.500764.8f

Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017; 12(4): e0174944. https://dx.doi.org/10.1371/journal.pone.0174944

Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019 Sep 19; 25(9): 1337–1340. https://dx.doi.org/10.1038/s41591-019-0548-6

Juneja K. Expanded and Filtered Features Based ELM Model for Thyroid Disease Classification. Wirel Pers Commun. 2022 Sep 1; 126(2): 1805–1842. https://dx.doi.org/10.1007/s11277-022-09823-7

Pavya K, Srinivasan B. Feature selection algorithms to improve thyroid disease diagnosis. IEEE Int Conf Innov Green Energy Healthc Technol. 2017: 1–5. https://dx.doi.org/10.1109/IGEHT.2017.8094070

Akhtar T, Gilani SO, Mushtaq Z, Arif S, Jamil M, Ayaz Y, et al. Effective voting ensemble of homogenous ensembling with multiple attribute-selection approaches for improved identification of thyroid disorder. Electron. 2021; 10(23): 3026. https://dx.doi.org/10.3390/electronics10233026

Duggal P, Shukla S. Prediction of thyroid disorders using advanced machine learning techniques. Proc Conflu 2020 - 10th Int Conf Cloud Comput Data Sci Eng. 2020; 670–675. https://dx.doi.org/10.1109/Confluence47617.2020.9058102

Deng W, Huang Z, Zhang J, Xu J. A Data Mining Based System for Transaction Fraud Detection. 2021 IEEE Int Conf Consum Electron Comput Eng. 2021: 542–545. https://dx.doi.org/10.1109/ICCECE51280.2021.9342376

Fenza G, Gallo M, Loia V, Orciuoli F, Herrera-Viedma E. Data set quality in Machine Learning: Consistency measure based on Group Decision Making. Appl Soft Comput. 2021 Jul; 106(9): 107366. https://dx.doi.org/10.1016/j.asoc.2021.107366

Sun Y, Wong AKC, Kamel MS. Classification of imbalanced data: A review. Int J Pattern Recognit Artif Intell. 2009; 23(4): 687–719. https://dx.doi.org/10.1142/S0218001409007326

Shamsolmoali P, Zareapoor M, Shen L, Sadka AH, Yang J. Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing. 2021; 459: 481–493. https://dx.doi.org/10.1016/j.neucom.2020.01.119

Esposito C, Landrum GA, Schneider N, Stiefl N, Riniker S. GHOST: Adjusting the Decision Threshold to Handle Imbalanced Data in Machine Learning. J Chem Inf Model. 2021; 61(6): 2623–2640. https://dx.doi.org/10.1021/acs.jcim.1c00160

Kann BH, Hosny A, Aerts HJWL. Artificial intelligence for clinical oncology. Cancer Cell. 2021; 39(7): 916–927. https://dx.doi.org/10.1016/j.ccell.2021.04.002

Paleyes A, Urma RG, Lawrence ND. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Comput Surv. 2022; 55(6):1–29. https://dx.doi.org/10.1145/3533378

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019; 380(14): 1347–1358. https://dx.doi.org/10.1056/nejmra1814259

Gan D, Shen J, An B, Xu M, Liu N. Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Comput Ind Eng. 2020 Feb; 140(2): 106266. https://dx.doi.org/10.1016/j.cie.2019.106266

Riajuliislam M, Rahim KZ, Mahmud A. Prediction of Thyroid Disease(Hypothyroid) in Early Stage Using Feature Selection and Classification Techniques. Int Conf Inf Commun Technol Sustain Dev. IEEE. 2021. p. 60–64. https://dx.doi.org/10.1109/ICICT4SD50815.2021.9397052

Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. npj Digit Med. 2018; 1(1): 1–10. https://dx.doi.org/10.1038/s41746-018-0029-1

Liang H, Tsui BY, Ni H, Valentim CCS, Baxter SL, Liu G, et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med. 2019 Mar 11; 25(3): 433–438. https://dx.doi.org/10.1038/s41591-018-0335-9

Rasheeduddin S, Rajasekhar Rao K. Constructing a system for analysis of machine learning techniques for early detection of thyroid. Int J Eng Adv Technol. 2019; 8: 1978–1981. https://dx.doi.org/10.35940/ijeat.F1385.0986S319

Peya ZJ, Chumki MKN, Zaman KM. Predictive Analysis for Thyroid Diseases Diagnosis Using Machine Learning. 2021 Int Conf Sci Contemp Technol. 2021: 4–9. https://dx.doi.org/10.1109/ICSCT53883.2021.9642544

Rao AR, Renuka BS. A Machine Learning Approach to Predict Thyroid Disease at Early Stages of Diagnosis. 2020 IEEE Int Conf Innov Technol. 2020; 1–4. https://dx.doi.org/10.1109/INOCON50539.2020.9298252

Shahid AH, Singh MP, Raj RK, Suman R, Jawaid D, Alam M. A Study on Label TSH, T3, T4U, TT4, FTI in Hyperthyroidism and Hypothyroidism using Machine Learning Techniques. 2019 Int Conf Commun Electron Syst. 2019;(Icces): 930–933. https://dx.doi.org/10.1109/ICCES45898.2019.9002284