Assessment of surface water quality using statistical analysis methods: Orontes River (Case study)

: The study investigates the water quality of the Orontes River, which is considered one of the important water recourses in Syria, as it is used for drinking, irrigation, swimming and industrial needs. A database of 660 measurements for 13 parameters concentrations used, were taken from 11 monitoring points distributed along the Orontes River for a period of five years from 2015-2019, and to study the correlation between parameters and their impact on water quality, statistical analysis was applied using (SPSS) program. Cluster analysis was applied in order to classify the pollution areas along the river, and two groups were given: (low pollution - high pollution), where the areas were classified according to the sources of pollution to which they are exposed. This indicates the importance of cluster analysis in studying movement of the pollutants and reducing the number of sampling points. Factor analysis gave 5 main factors responsible for explaining 92.86% of the total variance, with 78.2% measurement quality, it includes 7 basic parameters: (EC, TUR, NO 3 , Na, pH, NH 4 , COD). This study showed the ability of factor analysis in determining the most important parameters that effect on the water quality, which helps in reducing the number of parameters needed for sampling.


Introduction:
Modern statistical methods are considered one of the most important tools in Monitoring water quality 1 , as the importance of statistical analysis lies in studying large numbers of data that are difficult to manually analyze to obtain accurate and logical practical results 2 . Currently, interest in studies that adopt modern scientific techniques has increased in the management of water resources 3 and identifying the sources of their pollution to determine appropriate treatment methods for different uses 4 . Securing fresh water resources is one of the most important issues currently prevalent 5 , and rivers constitute the most important of these sources, but they are exposed to large amounts of pollution, such as industrial, sanitation and agricultural wastes 6 , as these pollutants affect the physical and chemical properties of water, and suitability for various uses 7 . Hence, the necessity to monitor the quality of river water and determine the indicators affecting the pollution of these resources 8 . The importance of research in determining the minimum number of parameters 9 , that indicates the quality of the Orontes River water, by applying statistical methods to determine the sources of pollution 10 , to find appropriate treatment methods for the use, rather than traditional methods such as "polls and expert opinion" 11 . In a study by Singh et al. 12 to assess the quality of the Gomti River in India, 24 parameters were selected to estimate temporal and spatial changes using factor analysis, where 6 factors were identified responsible for 70% of the variation in the river and the cluster analysis gave 3 groups. Muangthong 13 applied multivariate statistical techniques to assess the spatial and temporal differences of the Nampong River by observing 16 parameters. In a study by Ghosh 14 Factor analysis was used to study the time changes of Fuji River, as it gave 4 basic factors responsible for explaining the changes, and cluster analysis gave /3/ groups. The study aims to determine the most important pollution indicators by statistical analysis (cluster analysis, factor analysis) to assess the Orontes river quality and identify the most important sources of pollution.
Study area: The Orontes River is considered one of the most important water resources in Syria, it has a 370 km total length, it originates from Lebanon and enters Syria at "Omiri", as it is the main source for drinking water in Homs and Hama city, and the water needed for industrial activities and irrigation of large areas of agricultural 15 . The most important projects: Qattinah Lake: 200 million m 3 per year, Rastan Dam: 230 m3, Mharda Dam: 67 m 3 , Natural lakes 16 .
The pollution Sources of on the Orontes River: 1) It results from throwing sewage water randomly without treatment to all villages and towns near the river 17 . 2) Industrial drainage: it results from a group of industrial facilities near the river, and they are ( Fig. 1). 1-Fertilizers Company: the pollutants resulting from the manufacture of fertilizers.
2-The industrial city: it has more than 1000 industrial facilities.
4-Unit (623) for machinery: its discharges contain high levels of pollutants such as oils, copper, Cadmium and grease. 5-Sugar factory: Industrial wastewater that has not been properly treated.
6-Dairy Company: The discharges contain high levels of pollutants. Multivariate statistical analysis techniques are used to analyze and study data to explain phenomena behavior 18 , and describe the relationships between variables. Statistical methods are used to analyze large numbers of data in place of expert opinion 19 , as this method is considered more specific and accurate than traditional methods 20 . Among the most important statistical methods used to determine the parameters: (Cluster analysis (CA), factor analysis (FA)), the statistical study of the data set was performed using /SPSS v 25/ which is used to analyze complex data 21 . Statistical process Stages: Data (collection, classification, analysis, Explaining, Information formulating 22 .

Cluster Analysis (CA):
The cluster analysis method is based on studying the behaviour of the data and works to group it into clusters that are similar in their characteristics, as each group includes several variables that are similar and different from the other group 23 .

Factor Analysis (FA):
It is used to clarifying the correlation relationship between variables to simplify the interrelationships and reduce their number 24 , as it gives a set of basic factors that work to explain the studied phenomenon, as it sorts the most influential factors, and produces a group of factors that responsible for generating the variance between different groups 25 .

Methods and Materials:
Data collection: Samples were taken monthly from 11 monitoring points starting from the beginning of the river "Omiri" to its end "Gajar Amir" from 2015 to 2019, the samples were collected in polythene containers, that were well washed to avoid any possible contamination and all other precautionary measures were taken. A set of 13 chemical and physical parameters were chosen to perform the statistical analysis to study the changes, as follows: (Hydrogen Ion pH, Temperature T, Dissolved Oxygen DO, Electrical Conductance EC, Chemical Oxygen Demand COD, Biochemical Oxygen Demand BOD, Chloride Cl, Phosphate PO 4 , Nitrate NO 3 , Ammonium NH 4 , Turbidity TUR, Sodium Na, Total Hardness TH). The test was carried out on samples in the environmental and water resources directorates in Homs.
The GPS /s1-s11 / observatory coordinate points along the river have been identified 26 , as shown in Fig. 2 and Table 1 Orontes River

Open Access
Baghdad Science Journal

Results and Discussion:
Clustering of the Sites by Cluster Analysis: In order to study the movement of pollutants along the Orontes River course and its distance from the source of pollution, cluster analysis was applied using the hierarchical analysis method on the data set, by the arithmetic mean value of each parameter during the period from 2015 -2019 (Table 2).  Regarding the results of the hierarchical analysis, the cluster model gave two clusters with a good measurement quality < 0.5 (Fig. 3).  Cluster 1: This cluster includes the observatories group that are located at the beginning of the river, which are (Omiri, Rabblah, Qanater, Al-nabi Mando, Qattinah entrance, Qattinah exit, Tarek Alsham). This group can be classified as (low pollution cluster) due to the few pollutants that it is exposed to; it is considered almost pure and needs simple conventional treatment for various uses.
Cluster 2: This cluster includes the second observatories group, which are: (Al-gonto, Gajar Amir, Tarek Trablous, Al-dweer), it is considered highly polluted due to receiving large quantities of wastewater from residential communities, discharges from industrial facilities such as (fertilizer factory, Homs refinery, Industrial city). It can be said that this group is considered to be highly polluted and need an advanced treatment for any use and cannot be used for drinking.
These results indicate the importance of cluster analysis in classifying the pollution resources and determining the movement of pollutants along the river, that helps grouping monitoring stations with similar characteristics, reducing the number of sites required for sampling, the sampling costs and analysis process, in addition to saving the effort and time required to determine the quality of river water.
The cluster analysis results were used in the application of the factor analysis, as it was applied separately on the data of each cluster, to determine the parameters affecting each section of the river (low pollution and high pollution), to take into account the difference of the parameters resulting from pollutants according to each region. Application of factor analysis: To compare the structural patterns between two groups (clusters) and to determine the parameters affecting each cluster, factor analysis was applied by the principal component method for the first cluster (the first 7 observatories which contain 420 measurements). The results for the first cluster were given by Kaiser-Mayer test in Table. 3 / KMO / measurement quality at 61.4% > 50 %, which is considered good and with sig value 0 > 0.05 this means that the analysis is acceptable. .000 Structural patterns study according to clusters groups: a. Cluster1: the first group includes 7 observatories points of: (Omiri, Rabblah, Qanater, Alnabi Mando, Qattinah entrance, Qattinah exit, Tarek Alsham). The results for the first group gave /3/ basic factors responsible for explaining 76.174% of the total variance in water quality, so that each factor can contain a set of parameters that affect it with a variance ratio for each factor, where the first factor constitutes the largest proportion of the variance 42,568% (Table 4).  The scree plot chart: This chart indicates the initial eigenvalue of each factor, so that each value above the line can have high importance in determining pollution parameters. The chart shows 3 factors above the line, which indicates their importance in influencing water quality (Fig. 5).  Table. 6 /KMO/ measurement quality with a degree of 78.2% > 50%, which is considered a very good measurement degree and with sig value 0 > 0.05 that's mean the analysis is acceptable. For the second group, which was considered polluted, it gave 5 basic factors with a high Variance ratio 92.462%, as it includes 7 parameters that affect them, and the first factor constitutes the highest interpretation rate up to 39.955% of the total variance (Table 7).  Table. 8 shows the saturation percentage and influence of the parameter of each factor, the Interpretation ratio was 92.462 % is considered a very good variance ratio of 13 parameters: (EC, TUR, NO 3 , Na, pH, NH 4 , COD). The scree plot chart: This chart indicates the initial eigenvalue of each factor, so that each value above the line can have high importance in determining the pollution factors, and the chart shows 5 factors above the line, which indicates their importance in influencing water quality (Fig. 7).  (Table 9), as: Table 9. Interpretation of the factor variance ratio for cluster 1

Cluster 1
The first It has a variance ratio 42.568 % of the total variance, and it has a strong relationship with 5 parameters which are (COD, BOD 5 , DO, PO 4 , NO 3 ), This include a clear effect of seasonal changes on chemical and biochemical oxygen consumption, and this indicates that pollution is result of some human activities (sanitation, agricultural drainage). The second It constitutes 21.814% of the total variance and has a strong relationship with 1 parameter (EC) and this indicates that the pollution is caused by the presence of some human activities in the area (sewage). The third It constitutes 11.792% variance ratio, and has a strong relationship with 3 parameters (TH, NH 4 , TUR), and this indicates bacterial contamination and high calcium content in the water.

II.
(For the second Cluster): this showed /5/ basic factors, and includes the main parameters that affect Table 10, as: The first It constitutes 39.955 % variance ratio, and it has a strong relationship with 2 parameters: (EC, Na) and this indicates that pollution is result of industrial wastewater and the increase in the proportion of salts. The second It constitutes 15.359 % variance ratio, and it has a strong relationship with 1 parameter (COD) and this indicates that the increase in pollution resulting from industrial activities in the region. The third It constitutes 14.053 % variance ratio, and it has a strong relationship with 2 parameters (NH 4 , NO 3 ), this indicates the presence of bacterial pollution (sewage), in addition to pollution with nitrogenous fertilizers, which resulting of the agricultural works in the area and the pollutants of the fertilizer factory. The fourth It constitutes 13.975 % variance ratio, and it has a strong relationship with 1 parameter (TUR), this indicates the presence of high levels of sedimentation entering the water due to rainstorms resulting from soil erosion or the presence of some human activities such as building and construction and some industries such as mining. The fifth It constitutes 9.121 % variance ratio, and it has a strong relationship (pH), this indicates changes in temperature as having significant effects on the pH measurement and therefore may be due to the discharge of industrial water directly into the river.

Conclusion:
By comparing the results of the factor analysis by clusters group, the best results are obtained from the second group, which gives 7 parameters with a very good quality, to explain their effect on water quality with a variance of 92.462% of the total variance. They are classified according to importance as follows: (EC, TUR, NO 3 , Na, pH, NH 4 , COD). The areas of the second group are affected by industrial and agricultural pollutants, while the areas of the first group having a weak impact of pollution, are limited to small amounts of sanitation and agricultural activity, this indicates the importance of statistical analysis in the ability to determine pollution source according to parameters and selecting the most parameter that affect. That also helps in reducing the number of parameters needed to determine the water quality, and thus the sampling cost and analysis. As for cluster analysis, it helps in studying the movement of pollutants and the effect of changing parameters along the river, and it groups the stations with similar characteristics, which reduces the costs of sampling and saves the effort and time required to determine the water quality. The results show that the water of the Orontes River is not polluted at the beginning of the river, as the water quality is very good and the values of the variables are within the permissible limits according to the standards of the Syrian Standard Specification for drinking water. In the end, it can be said that determining the quality of surface water, especially (rivers), has become one of the most important axes on which modern research currently focuses, due to the urgent need for fresh water and the problems that arise from its degradation. Hence, the necessity of establishing quality control of water resources and using the fastest means to determine their quality and find the most important indicators of pollution in order to develop the necessary procedures for the appropriate treatment of various uses (drinking, agriculture, industry).