Turbid of Water By Using Fuzzy C- Means and Hard K- Means

In this research two algorithms are applied, the first is Fuzzy C Means (FCM) algorithm and the second is hard K means (HKM) algorithm to know which of them is better than the others these two algorithms are applied on a set of data collected from the Ministry of Planning on the water turbidity of five areas in Baghdad to know which of these areas are less turbid in clear water to see which months during the year are less turbid in clear water in the specified area.


Introduction:
The cluster analysis is a branch of statistical multivariate analysis and unsupervised pattern recognition learning. The cluster is a process of classifying observations into different groups by partitioning the dataset while the elements within a group (cluster) possess high similarity while they differ from the elements in a different group (cluster). [1] The cluster analysis is divided into two types : 1-Fuzzy clustering 2-Hard Clustering In fuzzy clustering techniques elements (observations ) in the dataset may belong to two clusters or more with different degrees of membership. That means x∈ [0,1]. In hard clustering techniques each element (observation ) in the dataset belongs to a definite cluster then it could not be included in another cluster. That means x∈ {0,1}. [1,2,3] In this study we take FCM algorithm as one of Fuzzy clustering algorithms and HKM as one of Hard clustering. This study is organized as follows: section two contains the FCM and HKM algorithms .Section three contains the Experiment. Section four contains the results and discussion. Section Five contains the conclusion

Fuzzy C means (FCM) and Hard K Means (HKM) algorithm Fuzzy C means (FCM) Algorithm
The FCM algorithm was introduced by Dunn in ( 1973 ) and was developed by Bezdek in ( 1981 ). The FCM algorithm belongs to the family of algorithms that build fuzzy partition. It is one of the most used fuzzy clustering. In this algorithm the observations can belong to many clusters in the same time with different membership degree. [1] This algorithm is an iterative clustering algorithm that produces an optimal K cluster by minimizing the weight within group sum of squared error objective function = ( ; , ) } is the dataset (observations ) matrix with × dimensional, k is the number of clusters with 2 ≤ k ≤ , Uij represent the degree of membership of X, m is the weight exponent on each fuzzy membership, 1≤m≤∞ , is the center of cluster j, 2 ( , ) represent the distance measure between the elements xi and cluster center cj. [2] The steps of FCM algorithm as follows : 1-Generate partition matrix randomly but to constraints this matrix we should satisfy the three conditions : 2-Putting the data set X={x1,x2,….,xn} where X is called dataset (observations) matrix with dimension × 3-Determine, the number of clusters 2 ≤ k ˂ n ,the fuzziness exponential m, where 1≤m≤∞ . 4-Compute the centroid of each cluster by the formula : The formula by the elements is : Where the dimension of the matrix center × 5-Calculate the distance between data ( observations) matrix X and cluster center C by square Euclidean norm, by formula :

Hard K Means Algorithm
The HKM algorithm clustering (or Called Lloyd Forgy algorithm ) was developed by James Macqueen in 1967. The HKM algorithm was known to be fast clustering but it is sensitive to the choice of starting point and inefficient for solving clustering problems in the large observations. For this the HKM could be applied to large dataset. [5] The HKM clustering groups the data points based on their nearness to each other according to the Euclidean distance . The aim of this algorithm is to partition a collection of data points into cluster of similar data point in the same cluster and maximize the different with another clusters. Computationally, this algorithm is similar to the technique of variance analysis, but inversely. This algorithm begins with k clusters and randomly putting initial cluster centers and then assigns the observation to the nearest cluster center HKM is an iterative algorithm. It depends on minimizing the sum of distances from each data point (observations) to its cluster center. The data point is moved between clusters until the sum should not be decreased any more. [6] To compute the minimizing objective function of HKM the following formula is used : The steps of HKM algorithm as follows : 1) The centroids of K clusters will be chosen from randomly where represented the dataset matrix.
2) Calculate the distances between cluster center and the data point. 3) Each data point (observation) is putting on closest centroid cluster . 4) the matrix of cluster center are updated by the formula: Where represent the elements in cluster jth and is the number of elements in 5) Recalculated the distance of cluster center was refreshing. 6) This algorithm is stopped if the new cluster has no data point assigned, otherwise the steps are repeated from (3) to (5) for any possible movements of data points between the clusters. [2] Experiment In this research, the study of clear water turbidity is done in two recitations of five areas in Baghdad city.These recitations are for one year. This study was done in the Iraqi Ministry of Planning / Central Organization for Standardization and Quality Control/ Nutrition Laboratory. The idea of this study is to determine which areas are less turbid in clear water by using FCM algorithm through the objective function values and the error term .After that employing the HKM algorithm was to determine which months during the year that was less turbid water clear.

Results and discussion:
1-The numerical results are computed for all cities which are studied in this research by using the FCM algorithm :

Table )1(. The objective function and error term of FCM algorithm for three clusters
As shown in Table (1) Al Amerya city that have the smallest objective function compared with the other cities but it have a big error compared with error term of other cities Therefore the new Baghdad city is chosen which is the best city for less turbidity in the water because it has minimum error term and a good objective function. 2-Now, employing the HKM to choose the months that are of less turbidity for all cities. In the beginning, the diagrams are drawn to show the expanse turbid water in all months of the year in the first column. The second column shows the minimum expanse turbid water in any month of the year, the third column shows the maximum expanse turbid water in any month of the year, the fourth column shows the arithmetic mean for expanse turbid water and the fifth column shows the standard deviation for expanse turbid water 3-Now, we want to confine the number of months (element) in each cluster which has the water are less turbidity form the other months. Then we suppose that cluster (a) contains the turbid water, cluster (b) have a less water are less turbidity and cluster (c) has less turbidity from the other months.  As shown in Table(3) and Figure (7) the New Baghdad city in cluster (a) was more turbid in water specially in months (6,7,8,9) and in the cluster (b) was less water turbidity specially in months (1,2,3,4,5). That means, these months are convenient for reality since when the Ministry of Water Resources is asked which months has more turbidity water in Iraq, the Ministry says the summer season but when the ministry is asked which are the months has less turbidity water, it says the winter season.