Exploring Important Factors in Predicting Heart Disease Based on Ensemble- Extra Feature Selection Approach
Main Article Content
Abstract
Heart disease is a significant and impactful health condition that ranks as the leading cause of death in many countries. In order to aid physicians in diagnosing cardiovascular diseases, clinical datasets are available for reference. However, with the rise of big data and medical datasets, it has become increasingly challenging for medical practitioners to accurately predict heart disease due to the abundance of unrelated and redundant features that hinder computational complexity and accuracy. As such, this study aims to identify the most discriminative features within high-dimensional datasets while minimizing complexity and improving accuracy through an Extra Tree feature selection based technique. The work study assesses the efficacy of several classification algorithms on four reputable datasets, using both the full features set and the reduced features subset selected through the proposed method. The results show that the feature selection technique achieves outstanding classification accuracy, precision, and recall, with an impressive 97% accuracy when used with the Extra Tree classifier algorithm. The research reveals the promising potential of the feature selection method for improving classifier accuracy by focusing on the most informative features and simultaneously decreasing computational burden.
Received 27/09/2023
Revised 10/02/2024
Accepted 12/02/2024
Published 25/02/2024
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
How to Cite
References
Joloudari JH, Saadatfar H, Dehzangi A, Shamshirband S. Computer-aided decision-making for predicting liver disease using PSO-based optimized SVM with feature selection. Inform Med Unlocked. 2019; 17(August):100255. https://doi.org/10.1016/j.imu.2019.100255
Sarwade JM, Mathur H. Performance analysis of symptoms classification of disease using machine learning algorithms. 2020; 11(3):2024–2032. https://doi.org/10.1109/ICECA55336.2022.10009407
Elzeheiry HA, Barakat S, Rezk A. Different Scales of Medical Data Classification Based on Machine Learning Techniques: A Comparative Study. Applied Sciences (Switzerland). 2022; 12(2). https://doi.org/10.3390/app12020919
Alelyani S. Stable bagging feature selection on medical data. J Big Data. 2021; 8(1). https://doi.org/10.1186/s40537-020-00385-8
Liu S, Yao J, Zhou C, Motani MS. SURI: Feature Selection Based on Unique Relevant Information for Health Data. Proceedings - 2018 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2018. 2019; 687–692. https://doi.org/10.1109/BIBM.2018.8621163
Zhang F, Luo C, Lan C, Zhan J. Benchmarking feature selection methods with different prediction models on large-scale healthcare event data. TBench. 2021; 1(1):100004. https://doi.org/10.1016/j.tbench.2021.100004
Abdollahi J, Nouri-Moghaddam B. A hybrid method for heart disease diagnosis utilizing feature selection based ensemble classifier model generation. Sci Iran D Comput Sci Eng Electr Eng. 2022; 5(3):229–246. https://doi.org/10.1007/s42044-022-00104-x
Pathan MS, Nag A, Pathan MM, Dev S. Analyzing the impact of feature selection on the accuracy of heart disease prediction. Healthc Anal (N Y). 2022; 2(April):100060. https://doi.org/10.1016/j.health.2022.100060
Joloudari JH, Joloudari EH, Saadatfar H, Ghasemigol M, Razavi SM, Mosavi A, Nadai L. Coronary artery disease diagnosis; ranking the significant features using a random trees model. Int J Environ Res Public Health. 2020; 17(3). https://doi.org/10.3390/ijerph17030731
Pavithra V, Jayalakshmi V. Review of feature selection techniques for predicting diseases. Proceedings of the 5th International Conference on Communication and Electronics Systems, ICCES 2020. 2020; 1213–1217. https://doi.org/10.1109/ICCES48766.2020.9138058
Dissanayake K, Johar MGM. Comparative study on heart disease prediction using feature selection techniques on classification algorithms. Applied Computational Intelligence and Soft Computing. 2021. https://doi.org/10.1109/ICREST51555.2021.9331158
Bashir S, Khan ZS, Hassan Khan F, Anjum A, Bashir K. Improving Heart Disease Prediction Using Feature Selection Approaches. Proceedings of 2019 16th International Bhurban Conference on Applied Sciences and Technology, IBCAST 2019. 2019; 619–623. https://doi.org/10.1109/IBCAST.2019.8667106
Roberts CA, Binder M, Antoine D. Reflections on Cardiovascular Disease: The Heart of the Matter. In: Binder M, Roberts CA, Antoine D, editors. The Bioarchaeology of Cardiovascular Disease. Cambridge: Cambridge University Press; 2023. p. 258–62. https://www.cambridge.org/core/books/abs/bioarchaeology-of-cardiovascular-disease/reflections-on-cardiovascular-disease/23D4812D144A7823B844EEA28559E5F4
Latha CBC, Jeeva SC. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked. 2019; 16(June):100203. https://doi.org/10.1016/j.imu.2019.100203
Das S, Sultana M, Bhattacharya S, Sengupta D, De D. XAI–reduct: accuracy preservation despite dimensionality reduction for heart disease classification using explainable AI. J Supercomput. 2023. https://doi.org/10.1007/s11227-023-05356-3
Azmi J, Arif M, Nafis MT, Alam MA, Tanweer S, Wang G. A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Med Eng Phys. 2022; 105(May):103825. https://doi.org/10.1016/j.medengphy.2022.103825
Saw M, Saxena T, Kaithwas S, Yadav R, Lal N. Estimation of prediction for getting heart disease using logistic regression model of machine learning. 2020 International Conference on Computer Communication and Informatics, ICCCI 2020. 2020; 20–25. https://doi.org/10.1109/ICCCI48352.2020.9104210
Haq AU, Li JP, Memon MH, Nazir S, Sun R, Garciá-Magarinõ IA. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mobile Information Systems. 2018. https://doi.org/10.1155/2018/3860146
Spencer R, Thabtah F, Abdelhamid N, Thompson M. Exploring feature selection and classification methods for predicting heart disease. Digit Health. 2020; 6:1–10. https://doi.org/10.1177/2055207620914777
Kharwar AR, Thakor D V. An Ensemble Approach for Feature Selection and Classification in Intrusion Detection Using Extra-Tree Algorithm. Int J Inf Secur Priv. 2022; 16(1):1–21. https://www.igi-global.com/gateway/article/285019
Firdaus FF, Nugroho HA, Soesanti I. A Review of Feature Selection and Classification Approaches for Heart Disease Prediction. IJITEE (International Journal of Information Technology and Electrical Engineering). 2021; 4(3):75. https://doi.org/10.22146/ijitee.59193
Maghdid S, Rashid TA. An Extensive Dataset for the Heart Disease Classification System. Mendeley Data. 2022. https://doi.org/ 10.17632/65gxgy2nmg.2
Bora N, Gutta S, Hadaegh A. Using Machine Learning to Predict Heart Disease. WSEAS Trans. Biol. Biomed. 2022; 19:1–9.https://www.semanticscholar.org/paper/Using-Machine-Learning-to-Predict-Heart-Disease-Bora-Gutta/3f531cbc6abe322151382d69b25f1a4559867a44
Ahmad GN, Fatima H, Abbas M, Rahman O. Mixed Machine Learning Approach for Efficient Prediction of Human Heart Disease by Identifying the Numerical and Categorical Features. 2022. https://www.mdpi.com/2076-3417/12/15/7449
Cai L, Li Y, Xiong Z. JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters. Comput Secur. 2021 Jan 1; 100:102086. https://doi.org/10.1016/j.cose.2020.102086
Sarra RR, Dinar AM, Mohammed MA, Abdulkareem KH. Enhanced Heart Disease Prediction Based on Machine Learning and χ2 Statistical Optimal Feature Selection Model. Designs. 2022; 6(5). https://doi.org/10.3390/designs6050087
Hamidzadeh J. Robust Feature Selection by Filled Function and Fisher Score. 2022. https://doi.org/10.21203/rs.3.rs-1102788/v1
Alfian G, Syafrudin M, Fahrurrozi I, Fitriyani NL, Atmaji FTD, Widodo T, et al. Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method. Comput. 2022; 11(9). https://doi.org/10.3390/computers11090136
Sanmorino A, Marnisah L, Sunardi H. Feature Selection Using Extra Trees Classifier for Research Productivity Framework in Indonesia. In: Lecture Notes in Electrical Engineering. 2023. p. 13–21. https://doi.org/10.1007/978-981-99-0248-4_2
Yazdani A, Varathan KD, Chiam YK, Malik AW, Wan Ahmad WA. A novel approach for heart disease prediction using strength scores with significant predictors. BMC Med Inform Decis Mak. 2021; 21(1):1–16.
https://doi.org/10.1186/s12911-021-01527-5
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning. 2006; 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1
Moosmann F, Triggs B, Jurie F. Fast discriminative visual codebooks using Randomized Clustering Forests. Adv Neural Inf Process Syst. 2007; 985–92.
https://ieeexplore.ieee.org/document/6287461
Sharma J, Giri C, Granmo OC, Goodwin M. Multi-layer intrusion detection system with ExtraTrees feature selection, extreme learning machine ensemble, and softmax aggregation. Eurasip J. Inf. Secur. 2019 Dec; 2019(1):1-6.https://doi.org/10.1186/s13635-019-0098-y
Louppe G, Wehenkel L, Sutera A, Geurts P. Understanding variable importances in forests of randomized trees. Advances in neural information processing systems. 2013; 26. https://dl.acm.org/doi/10.5555/2999611.2999660
Scirica BM, Bhatt DL, Braunwald E, et al. Prognostic Implications of Biomarker Assessments in Patients with Type 2 Diabetes at High Cardiovascular Risk: A Secondary Analysis of a Randomized Clinical Trial. JAMA Cardiol. 2016; 1(9):989–998. https://doi.org/10.1001/jamacardio.2016.3030
Bradley J, Schelbert EB, Bonnett LJ, Lewis GA, Lagan J, Orsborne C, et al. Predicting hospitalisation for heart failure and death in patients with, or at risk of, heart failure before first hospitalisation: a retrospective model development and external Validation study. Lancet Digit Health. 2022; 4(6):e445–e454. https://doi.org/10.1016/S2589-7500(22)00045-0
Abawajy J, Darem A, Alhashmi AA. Feature subset selection for malware detection in smart IoT platforms. Sensors (Basel). 2021; 21(4):1–19. https://doi.org/10.3390/s21041374
Sulaiman Maghdid S. Analysis and Prediction of Heart Attacks Based on Design of Intelligent Systems. J Mech Contin Math Sci. 2019; 14(4). https://doi.org/10.26782/jmcms.2019.08.00051
Rajesh N, Maneesha T, Hafeez S, Krishna H. Prediction of heart disease using machine learning algorithms. Int J Eng Technol. 2018; 7(2.32 Special Issue 32):363–6. https://doi.org/10.14419/ijet.v7i2.32.15714
Enad HG, Mohammed MA. A Review on Artificial Intelligence and Quantum Machine Learning for Heart Disease Diagnosis: Current Techniques, Challenges and Issues, Recent Developments, and Future Directions. Fusion Pract Appl. 2023; 11(1):08–25. https://doi.org/10.54216/FPA.110101
Ayon SI, Islam MM, Hossain MR. Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques. IETE J Res. 2022; 68(4):2488–507. https://doi.org/03772063.2020.1713916
Christopher JJ, Nehemiah HK, Arputharaj K, Moses GL. Computer-assisted Medical Decision-making System for Diagnosis of Urticaria. MDM Policy Pract. 2016; 1(1). https://doi.org/10.1177/2381468316677752
Elgin Christo VR, Khanna Nehemiah H, Minu B, Kannan A. Correlation-based ensemble feature selection using bioinspired algorithms and classification using backpropagation neural network. Comput Math Methods Med. 2019. https://doi.org/10.1155/2019/7398307
Zhang Y, Zhou Y, Zhang D, Song W. A stroke risk detection: Improving hybrid feature selection method. J. Med. Internet Res. 2019; 21(4). https://doi.org/10.2196/12437
Ali F, El-Sappagh S, Islam SMR, Kwak D, Ali A, Imran M, et al. Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS. A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Inf Fusion. 2020 Nov 1; 63:208-22. https://doi.org/10.1016/j.inffus.2020.06.008
Bharti R, Khamparia A, Shabaz M, Dhiman G, Pande S, Singh P. Prediction of Heart Disease Using a Combination of Machine Learning and Deep Learning. Comput Intell Neurosci. 2021; 2021. https://doi.org/10.1155/2021/8387680
Mohammad AM, Attia H, Ali YH. Comparative Analysis of MFO, GWO and GSO for Classification of Covid-19 Chest X-Ray Images. Baghdad Sci J. 2023; 20(January 2020):1540–58. https://doi.org/10.21123/bsj.2023.9236
Mahmood RAR, Abdi AH, Hussin M. Performance evaluation of intrusion detection system using selected features and machine learning classifiers. Baghdad Sci J. 2021; 18(2):884–98. https://doi.org/10.21123/bsj.2021.18.2(Suppl.).0884