Exploring Important Factors in Predicting Heart Disease Based on Ensemble- Extra Feature Selection Approach

Main Article Content

Howida Abubaker
https://orcid.org/0000-0001-6737-323X
Farkhana Muchtar
https://orcid.org/0000-0002-5636-5741
Alif Ridzuan Khairuddin
https://orcid.org/0000-0001-5560-4539
Ahmad Najmi Amerhaider Nuar
https://orcid.org/0000-0003-3103-664X
Zuriahati Mohd Yunos
Carolyn Salimun
https://orcid.org/0000-0001-6419-6039

Abstract

Heart disease is a significant and impactful health condition that ranks as the leading cause of death in many countries. In order to aid physicians in diagnosing cardiovascular diseases, clinical datasets are available for reference. However, with the rise of big data and medical datasets, it has become increasingly challenging for medical practitioners to accurately predict heart disease due to the abundance of unrelated and redundant features that hinder computational complexity and accuracy. As such, this study aims to identify the most discriminative features within high-dimensional datasets while minimizing complexity and improving accuracy through an Extra Tree feature selection based technique. The work study assesses the efficacy of several classification algorithms on four reputable datasets, using both the full features set and the reduced features subset selected through the proposed method. The results show that the feature selection technique achieves outstanding classification accuracy, precision, and recall, with an impressive 97% accuracy when used with the Extra Tree classifier algorithm. The research reveals the promising potential of the feature selection method for improving classifier accuracy by focusing on the most informative features and simultaneously decreasing computational burden.

Article Details

How to Cite
1.
Exploring Important Factors in Predicting Heart Disease Based on Ensemble- Extra Feature Selection Approach . Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2025 Jan. 20];21(2(SI):0812. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9711
Section
article

How to Cite

1.
Exploring Important Factors in Predicting Heart Disease Based on Ensemble- Extra Feature Selection Approach . Baghdad Sci.J [Internet]. 2024 Feb. 25 [cited 2025 Jan. 20];21(2(SI):0812. Available from: https://bsj.uobaghdad.edu.iq/index.php/BSJ/article/view/9711

References

Joloudari JH, Saadatfar H, Dehzangi A, Shamshirband S. Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. Inform Med Unlocked. 2019; 17(August):100255. https://doi.org/10.1016/j.imu.2019.100255

Sarwade JM, Mathur H. Performance analysis of symptoms classification of disease using machine learning algorithms. 2020; 11(3):2024–2032. https://doi.org/10.1109/ICECA55336.2022.10009407

Elzeheiry HA, Barakat S, Rezk A. Different Scales of Medical Data Classification Based on Machine Learning Techniques: A Comparative Study. Applied Sciences (Switzerland). 2022; 12(2). https://doi.org/10.3390/app12020919

Alelyani S. Stable bagging feature selection on medical data. J Big Data. 2021; 8(1). https://doi.org/10.1186/s40537-020-00385-8

Liu S, Yao J, Zhou C, Motani MS. SURI: Feature Selection Based on Unique Relevant Information for Health Data. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. 2019; 687–692. https://doi.org/10.1109/BIBM.2018.8621163

Zhang F, Luo C, Lan C, Zhan J. Benchmarking feature selection methods with different prediction models on large-scale healthcare event data. TBench. 2021; 1(1):100004. https://doi.org/10.1016/j.tbench.2021.100004

Abdollahi J, Nouri-Moghaddam B. A hybrid method for heart disease diagnosis utilizing feature selection based ensemble classifier model generation. Sci Iran D Comput Sci Eng Electr Eng. 2022; 5(3):229–246. https://doi.org/10.1007/s42044-022-00104-x

Pathan MS, Nag A, Pathan MM, Dev S. Analyzing the impact of feature selection on the accuracy of heart disease prediction. Healthc Anal (N Y). 2022; 2(April):100060. https://doi.org/10.1016/j.health.2022.100060

Joloudari JH, Joloudari EH, Saadatfar H, Ghasemigol M, Razavi SM, Mosavi A, Nadai L. Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int J Environ Res Public Health. 2020; 17(3). https://doi.org/10.3390/ijerph17030731

Pavithra V, Jayalakshmi V. Review of feature selection techniques for predicting diseases. Proceedings of the 5th International Conference on Communication and Electronics Systems, ICCES 2020. 2020; 1213–1217. https://doi.org/10.1109/ICCES48766.2020.9138058

Dissanayake K, Johar MGM. Comparative study on heart disease prediction using feature selection techniques on classification algorithms. Applied Computational Intelligence and Soft Computing. 2021. https://doi.org/10.1109/ICREST51555.2021.9331158

Bashir S, Khan ZS, Hassan Khan F, Anjum A, Bashir K. Improving Heart Disease Prediction Using Feature Selection Approaches. Proceedings of 2019 16th International Bhurban Conference on Applied Sciences and Technology, IBCAST 2019. 2019; 619–623. https://doi.org/10.1109/IBCAST.2019.8667106

Roberts CA, Binder M, Antoine D. Reflections on Cardiovascular Disease: The Heart of the Matter. In: Binder M, Roberts CA, Antoine D, editors. The Bioarchaeology of Cardiovascular Disease. Cambridge: Cambridge University Press; 2023. p. 258–62. https://www.cambridge.org/core/books/abs/bioarchaeology-of-cardiovascular-disease/reflections-on-cardiovascular-disease/23D4812D144A7823B844EEA28559E5F4

Latha CBC, Jeeva SC. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked. 2019; 16(June):100203. https://doi.org/10.1016/j.imu.2019.100203

Das S, Sultana M, Bhattacharya S, Sengupta D, De D. XAI–reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI. J Supercomput. 2023. https://doi.org/10.1007/s11227-023-05356-3

Azmi J, Arif M, Nafis MT, Alam MA, Tanweer S, Wang G. A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Med Eng Phys. 2022; 105(May):103825. https://doi.org/10.1016/j.medengphy.2022.103825

Saw M, Saxena T, Kaithwas S, Yadav R, Lal N. Estimation of prediction for getting heart disease using logistic regression model of machine learning. 2020 International Conference on Computer Communication and Informatics, ICCCI 2020. 2020; 20–25. https://doi.org/10.1109/ICCCI48352.2020.9104210

Haq AU, Li JP, Memon MH, Nazir S, Sun R, Garciá-Magarinõ IA. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Information Systems. 2018. https://doi.org/10.1155/2018/3860146

Spencer R, Thabtah F, Abdelhamid N, Thompson M. Exploring feature selection and classification methods for predicting heart disease. Digit Health. 2020; 6:1–10. https://doi.org/10.1177/2055207620914777

Kharwar AR, Thakor D V. An Ensemble Approach for Feature Selection and Classification in Intrusion Detection Using Extra-Tree Algorithm. Int J Inf Secur Priv. 2022; 16(1):1–21. https://www.igi-global.com/gateway/article/285019

Firdaus FF, Nugroho HA, Soesanti I. A Review of Feature Selection and Classification Approaches for Heart Disease Prediction. IJITEE (International Journal of Information Technology and Electrical Engineering). 2021; 4(3):75. https://doi.org/10.22146/ijitee.59193

Maghdid S, Rashid TA. An Extensive Dataset for the Heart Disease Classification System. Mendeley Data. 2022. https://doi.org/ 10.17632/65gxgy2nmg.2

Bora N, Gutta S, Hadaegh A. Using Machine Learning to Predict Heart Disease. WSEAS Trans. Biol. Biomed. 2022; 19:1–9.https://www.semanticscholar.org/paper/Using-Machine-Learning-to-Predict-Heart-Disease-Bora-Gutta/3f531cbc6abe322151382d69b25f1a4559867a44

Ahmad GN, Fatima H, Abbas M, Rahman O. Mixed Machine Learning Approach for Efficient Prediction of Human Heart Disease by Identifying the Numerical and Categorical Features. 2022. https://www.mdpi.com/2076-3417/12/15/7449

Cai L, Li Y, Xiong Z. JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters. Comput Secur. 2021 Jan 1; 100:102086. https://doi.org/10.1016/j.cose.2020.102086

Sarra RR, Dinar AM, Mohammed MA, Abdulkareem KH. Enhanced Heart Disease Prediction Based on Machine Learning and χ2 Statistical Optimal Feature Selection Model. Designs. 2022; 6(5). https://doi.org/10.3390/designs6050087

Hamidzadeh J. Robust Feature Selection by Filled Function and Fisher Score. 2022. https://doi.org/10.21203/rs.3.rs-1102788/v1

Alfian G, Syafrudin M, Fahrurrozi I, Fitriyani NL, Atmaji FTD, Widodo T, et al. Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method. Comput. 2022; 11(9). https://doi.org/10.3390/computers11090136

Sanmorino A, Marnisah L, Sunardi H. Feature Selection Using Extra Trees Classifier for Research Productivity Framework in Indonesia. In: Lecture Notes in Electrical Engineering. 2023. p. 13–21. https://doi.org/10.1007/978-981-99-0248-4_2

Yazdani A, Varathan KD, Chiam YK, Malik AW, Wan Ahmad WA. A novel approach for heart disease prediction using strength scores with significant predictors. BMC Med Inform Decis Mak. 2021; 21(1):1–16.

https://doi.org/10.1186/s12911-021-01527-5

Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning. 2006; 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1

Moosmann F, Triggs B, Jurie F. Fast discriminative visual codebooks using Randomized Clustering Forests. Adv Neural Inf Process Syst. 2007; 985–92.

https://ieeexplore.ieee.org/document/6287461

Sharma J, Giri C, Granmo OC, Goodwin M. Multi-layer intrusion detection system with ExtraTrees feature selection, extreme learning machine ensemble, and softmax aggregation. Eurasip J. Inf. Secur. 2019 Dec; 2019(1):1-6.https://doi.org/10.1186/s13635-019-0098-y

Louppe G, Wehenkel L, Sutera A, Geurts P. Understanding variable importances in forests of randomized trees. Advances in neural information processing systems. 2013; 26. https://dl.acm.org/doi/10.5555/2999611.2999660

Scirica BM, Bhatt DL, Braunwald E, et al. Prognostic Implications of Biomarker Assessments in Patients with Type 2 Diabetes at High Cardiovascular Risk: A Secondary Analysis of a Randomized Clinical Trial. JAMA Cardiol. 2016; 1(9):989–998. https://doi.org/10.1001/jamacardio.2016.3030

Bradley J, Schelbert EB, Bonnett LJ, Lewis GA, Lagan J, Orsborne C, et al. Predicting hospitalisation for heart failure and death in patients with, or at risk of, heart failure before first hospitalisation: a retrospective model development and external Validation study. Lancet Digit Health. 2022; 4(6):e445–e454. https://doi.org/10.1016/S2589-7500(22)00045-0

Abawajy J, Darem A, Alhashmi AA. Feature subset selection for malware detection in smart IoT platforms. Sensors (Basel). 2021; 21(4):1–19. https://doi.org/10.3390/s21041374

Sulaiman Maghdid S. Analysis and Prediction of Heart Attacks Based on Design of Intelligent Systems. J Mech Contin Math Sci. 2019; 14(4). https://doi.org/10.26782/jmcms.2019.08.00051

Rajesh N, Maneesha T, Hafeez S, Krishna H. Prediction of heart disease using machine learning algorithms. Int J Eng Technol. 2018; 7(2.32 Special Issue 32):363–6. https://doi.org/10.14419/ijet.v7i2.32.15714

Enad HG, Mohammed MA. A Review on Artificial Intelligence and Quantum Machine Learning for Heart Disease Diagnosis: Current Techniques, Challenges and Issues, Recent Developments, and Future Directions. Fusion Pract Appl. 2023; 11(1):08–25. https://doi.org/10.54216/FPA.110101

Ayon SI, Islam MM, Hossain MR. Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques. IETE J Res. 2022; 68(4):2488–507. https://doi.org/03772063.2020.1713916

Christopher JJ, Nehemiah HK, Arputharaj K, Moses GL. Computer-assisted Medical Decision-making System for Diagnosis of Urticaria. MDM Policy Pract. 2016; 1(1). https://doi.org/10.1177/2381468316677752

Elgin Christo VR, Khanna Nehemiah H, Minu B, Kannan A. Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural network. Comput Math Methods Med. 2019. https://doi.org/10.1155/2019/7398307

Zhang Y, Zhou Y, Zhang D, Song W. A stroke risk detection: Improving hybrid feature selection method. J. Med. Internet Res. 2019; 21(4). https://doi.org/10.2196/12437

Ali F, El-Sappagh S, Islam SMR, Kwak D, Ali A, Imran M, et al. Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion. 2020 Nov 1; 63:208-22. https://doi.org/10.1016/j.inffus.2020.06.008

Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P. Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning. Comput Intell Neurosci. 2021; 2021. https://doi.org/10.1155/2021/8387680

Mohammad AM, Attia H, Ali YH. Comparative Analysis of MFO, GWO and GSO for Classification of Covid-19 Chest X-Ray Images. Baghdad Sci J. 2023; 20(January 2020):1540–58. https://doi.org/10.21123/bsj.2023.9236

Mahmood RAR, Abdi AH, Hussin M. Performance evaluation of intrusion detection system using selected features and machine learning classifiers. Baghdad Sci J. 2021; 18(2):884–98. https://doi.org/10.21123/bsj.2021.18.2(Suppl.).0884