Data Mining Techniques for Iraqi Biochemical Dataset Analysis

Main Article Content

Sarah Sameer
Suhad Faisal Behadili


This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB), and Support Vector Machine (SVM) techniques. CART gives clear results with high accuracy between the six supervised algorithms. It is worth noting that the preprocessing steps take remarkable efforts to handle this type of data, since its pure data set has so many null values of a ratio 94.8%, then it becomes 0% after achieving the preprocessing steps. Then, in order to apply CART algorithm, several determined tests were assumed as classes. The decision to select the tests which had been assumed as classes were depending on their acquired accuracy. Consequently, enabling the physicians to trace and connect the tests result with each other, which extends its impact on patients’ health.


Download data is not yet available.

Article Details

How to Cite
Sameer S, Behadili SF. Data Mining Techniques for Iraqi Biochemical Dataset Analysis. Baghdad Sci.J [Internet]. 2022Apr.1 [cited 2022Jun.26];19(2):0385. Available from:


Behadili SF, Abd MS, Mohammed IK, Al-SAYYID MM. Breast cancer decisive parameters for Iraqi women via data mining techniques. JOCMS. 2019 Apr 19;5(2).

Nilashi M, Ibrahim O, Dalvi M, Ahmadi H, Shahmoradi L. Accuracy improvement for diabetes disease classification: a case on a public medical dataset. Fuzzy Inf. Eng. 2017 Sep 1;9(3):345-57. DOI:

Huang Y, McCullagh P, Black N, Harper R. Feature selection and classification model construction on type 2 diabetic patients’ data. Artif Intell Med. 2007 Nov 1;41(3):251-62. DOI: 10.1016/j.artmed.2007.07.002

Li J, Fu AW, Fahey P. Efficient discovery of risk patterns in medical data. Artif Intell Med. 2009 Jan 1;45(1):77-89. DOI: 10.1136/svn-2017-000101

Wasan SK, Bhatnagar V, Kaur H. The impact of data mining techniques on medical diagnostics. Data Sci. J. 2006;5:119-26. DOI:

Aljumah AA, Ahamad MG, Siddiqui MK. Application of data mining: Diabetes health care in young and old patients. JKSUCI. 2013 Jul1;25(2): 127-36.

Salcedo-Bernal A, Villamil-Giraldo MP, Moreno-Barbosa AD. Clinical data analysis: An opportunity to compare machine learning methods. Procedia Comput Sci. 2016 Jan 1;100(100):731-8. DOI: 10.1016/j.procs.2016.09.218

Diwani SA, Yonah ZO. A novel holistic disease prediction tool using best fit data mining techniques. IJCDS. 2017 Mar 1;6(02):63-72. DOI:

Mustafa TK, Abd MS. Proposed approach for analysing general hygiene information using various data mining algorithms. IJS. 2017;58(1B):337-44.

Crook M. Clinical biochemistry and metabolic medicine. 8th ed. London. CRC Press, 2012. DOI

Drab K, Daszykowski M. Clustering in analytical chemistry. J AOAC Int. 2014 Jan 1;97(1):29-38. DOI:

Han J, Kamber M, Pei J. Data mining concepts and techniques. 3rd ed. Elsevier; 2011 Jun 9.

Müller AC, Guido S. Introduction to machine learning with Python: a guide for data scientists. " O'Reilly Media, Inc."; 2016 Sep 26.

Li M. Application of CART decision tree combined with PCA algorithm in intrusion detection. In2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS) 2017 Nov 24 (pp. 38-41). IEEE. DOI:10.1109/ICSESS.2017.8342859