# Enhancing Fuzzy C-Means Clustering with a Novel Standard Deviation Weighted Distance Measure

## Authors

• Ahmed Husham Mohammed Department of Statistics, College, College of Administration and Economics, University of Basrah, Basrah, Iraq. https://orcid.org/0000-0003-4384-6455
• Marwan Abdul Hameed Ashour Department of Statistics, College, College of Administration and Economics, University of Baghdad, Baghdad, Iraq https://orcid.org/0000-0001-8329-2894

## Keywords:

Cluster, Distance measures, FCM, Fuzzy logic, Hybrid algorithm

## Abstract

The aim of this paper is to present a new approach to address the Fuzzy C Mean algorithm, which is considered one of the most important and famous algorithms that addressed the phenomenon of uncertainty in forming clusters according to the overlap ratios. One of the most important problems facing this algorithm is its reliance primarily on the Euclidean distance measure, and by nature, the situation is that this measure makes the formed clusters take a spherical shape, which is unable to contain complex or overlapping cases. Therefore, this paper attempts to propose a new measure of distance, where we were able to derive a formula for the variance of the fuzzy cluster to be entered as a weight on the Euclidean Distance (WED) formula. Moreover, the calculation was processed partitions matrix through the use of the K-Means algorithm and creating a hybrid environment between the fuzzy algorithm and the sharp algorithm. To verify what was presented, experimental simulation was used and then applied to reality using environmental data for the physical and chemical examination of water testing stations in Basra Governorate. It was proven  through the experimental results that  the proposed distance measure Weighted Euclidean distance  had the advantage over improving the work of the HFCM algorithm through the criterion (Obj_Fun, Iteration, Min_optimization, good fit clustering and overlap) when (c = 2,3) and according to the simulation results, c = 2 was chosen to form groups for the real data, which contributed to determine the best objective function (23.93, 22.44, 18.83) at degrees of fuzzing (1.2, 2, 2.8), while according to the degree of fuzzing (m = 3.6), the objective function for Euclidean Distance (ED) was the lowest, but the criteria were (Iter. = 2, Min_optimization = 0 and )  which confirms that (WED) is the best.

## References

Dogruparmak SC, Keskin GA, Yaman S, Alkan A. Using Principal Component Analysis and Fuzzy C–means Clustering for the Assessment of Air Quality Monitoring. Atmos Pollut Res. 2014; 5(4): 656-663. https://doi.org/10.5094/APR.2014.075

Al-Mousa Y, Al-Jasem A, Dahhand ML. Improve the Result of K-Means Algorithms Using Factor Analysis. Res. J. Aleppo Univ. 2015; (16): 1-22. https://www.academia.edu/23149964

Kareem MA, Hamoudi AK, Abdullah AN. Elastic Electron Scattering From 11Li and 12Be Exotic Nuclei in the Framework of the Binary Cluster Model. Iraqi J Sci. 2016; 57(4B): 2664-2676.

Hussein Y, Abdel Jalil S. Proposed KDBSCAN Algorithm for Clustering. Iraqi J Sci. 2018; 59(1A): 173-178. https://doi.org/10.24996/ijs.2018.59.1A.18

Zhao G, Zhang L, Tang C, Hao W, Luo Y. Clustering of AE Signals Collected During Torsional Tests of 3D Braiding Composite Shafts Using PCA and FCM. Compos B Eng. 2019; 161: 547-554. https://doi.org/10.1016/j.compositesb.2018.12.145

Hamed MAR. Application of Surface Water Quality Classification Models Using Principal Components Analysis and Cluster Analysis. J geosci. environ. prot. 2019; 7(6): 26-41. https://doi.org/10.4236/gep.2019.76003

Abbas WA. Genetic Algorithm-Based Anisotropic Diffusion Filter and Clustering Algorithms for Thyroid Tumor Detection. Iraqi J Sci. 2020; 61(5): 1016-1026. https://doi.org/10.24996/ijs.2020.61.5.10

Shiltagh NA, Hussein MA. Data Aggregation in Wireless Sensor Networks Using Modified Voronoi Fuzzy Clustering Algorithm. J Eng. 2015; 21(4): 42-60. https://doi.org/10.31026/j.eng.2015.04.03

Mazhar AN, Naser EF. Hiding the Type of Skin Texture in Mice Based on Fuzzy Clustering Technique. Baghdad Sci J. 2020; 17(3(Suppl.)): 967-972. https://doi.org/10.21123/bsj.2020.17.3(Suppl.).0967

Yaqoob AF, Al-Sarray B. Finding Best Clustering For Big Networks with Minimum Objective Function by Using Probabilistic Tabu Search. Iraqi J Sci. 2019; 60(8): 1837-1845. https://doi.org/10.24996/ijs.2019.60.8.21

Abdul-Samad ST, Kamal S. Image Retrieval Using Data Mining Technique. Iraqi J Sci. 2020; 61(8): 2115-2125. https://doi.org/10.24996/ijs.2020.61.8.26

Yin Y, Sheng Y, Qin J. Interval Type-2 Fuzzy C-means Forecasting Model for Fuzzy Time Series. Appl Soft Comput. 2022 November; 129: 1-7. https://doi.org/10.1016/j.asoc.2022.109574

Mohammed SK, Taha MM, Taha EM, Mohammad MNA. Cluster Analysis of Biochemical Markers as Predictor of COVID-19 Severity. Baghdad Sci J. 2022; 19(6(Suppl.)): 1423-1429. https://doi.org/10.21123/bsj.2022.7454

Khouri L, Al-Mufti MB. Assessment of Surface Water Quality Using Statistical Analysis Methods: Orontes River (Case study). Baghdad Sci J. 2022; 19(5): 981-989. https://doi.org/10.21123/bsj.2022.6262

Nawaz M, Qureshi R, Teevno MA, Shahid AR. Object Detection and Segmentation by Composition of Fast Fuzzy C-mean Clustering Based Maps. J Ambient Intell Humaniz Comput. 2023; 14(6): 7173–7188. https://doi.org/10.1007/s12652-021-03570-6

Setiawan KE, Kurniawan A, Chowanda A, Suhartono D. (Eds.). Clustering Models for Hospitals in Jakarta Using Fuzzy C-means and K-means. Procedia Comput Sci.. 2023; 216: 356–363. https://doi.org/10.1016/j.procs.2022.12.146

Hartigan JA, Wong MA. A K-Means Clustering Algorithm. J R Stat Soc Ser C Appl Stat. 1979; 28(1): 100-108. https://doi.org/10.2307/2346830

Kadhum IJ, Mohammed AS. Classification & Evaluation of Evidence of Deprivation in Iraq (2009) by using Cluster analysis. J Econ Adm Sci. 2015; 21(82): 391-411. https://doi.org/10.33095/jeas.v21i82.630

Ning Z, Chen J, Huang J, Sabo UJ, Yuan Z, Dai Z. WeDIV – An improved k-means clustering algorithm with a weighted distance and a novel internal validation index. Egypt Inform J. 2022; 23(4): 133-144. https://doi.org/10.1016/j.eij.2022.09.002

Ashour MA. Optimum Cost of Transporting Problems with Hexagonal Fuzzy Numbers. J Southwest Jiaotong Univ. 2019; 54(6): 1-7. https://doi.org/10.35741/issn.0258-2724.54.6.10

Arora HD, Naithani AA. New Definition for Quartic Fuzzy Sets with Hesitation Grade Applied to Multi-Criteria Decision-Making Problems Under Uncertainty. Decis. Anal. J. 2023; 7: 1-10. https://doi.org/10.1016/j.dajour.2023.100239

Murfi H, Rosaline N, Hariadi, N. Deep Autoencoder-Based Fuzzy C-means for Topic Detection. Array. 2022; 13: 1-9. https://doi.org/10.1016/j.array.2021.100124

El-Zaghmouri B, Abu-Zanona M. Fuzzy C-Mean Clustering Algorithm Modification and Adaptation for Applications and Adaptation for Applications. WCSIT. 2012; 2(1): 42-45.

Javadi S, Rameez M, Dahl M, Pettersson MI. Vehicle Classification Based on Multiple Fuzzy C-Means Clustering Using Dimensions and Speed Features. Procedia Comput Sci.. 2018; 126: 1344–1350. https://doi.org/10.1016/j.procs.2018.08.085

Hameed SM, Mohammed MB, Attea BA. Fuzzy Based Spam Filtering. Iraqi J Sci. 2015; 56(1B): 506-519.

Goyal LM, Mittal M, Sethi JK. Fuzzy Model Generation Using Subtractive and Fuzzy C-Means Clustering. CSI trans ICT. 2016; 4(2-4): 129–133. https://doi.org/10.1007/s40012-016-0090-3

Oliveira JV, Pedrycz W, editors. Advances in Fuzzy Clustering and its Applications. 1st ed. The Atrium, Southern Gate, Chichester: John Wiley & Sons Ltd; 2007. 454p. https://doi.org/10.1002/9780470061190

Abdulghafoor SA, Mohamed LA. Using Some Metric Distance in Local Density Based on Outlier Detection Methods. J Posit. Psychol. Wellbeing. 2022; 6(1): 189-202.

Ahmad MR, Afzal U. Mathematical Modeling and AI Based Decision Making for COVID-19 Suspects Backed by Novel Distance and Similarity Measures on Plithogenic Hypersoft Sets. Artif Intell Med. 2022; 132: 1-8. https://doi.org/10.1016/j.artmed.2022.102390

Wierzchon ST, Kłopotek MA. Modern Algorithms of Cluster Analysis. 1Ed ed, Springer, Cham; 2018; 34.

Mota VC, Damasceno FA, Leite DF. Fuzzy Clustering and Fuzzy Validity Measures for Knowledge Discovery and Decision Making in Agricultural Engineering. Comput Electron Agric. 2018; 150: 118-124. https://doi.org/10.1016/j.compag.2018.04.011