Modeling Jar Test Results Using Gene Expression to Determine the Optimal Alum Dose in Drinking Water Treatment Plants

: Coagulation is the most important process in drinking water treatment. Alum coagulant increases the aluminum residuals, which have been linked in many studies to Alzheimer's disease. Therefore, it is very important to use it with the very optimal dose. In this paper, four sets of experiments were done to determine the relationship between raw water characteristics: turbidity, pH, alkalinity, temperature ,and optimum doses of alum [ Al 2 (SO4) 3 .14 𝐻 2 O] to form a mathematical equation that could replace the need for jar test experiments. The experiments were performed under different conditions and under different seasonal circumstances. The optimal dose in every set was determined, and used to build a gene expression model (GEP). The models were constructed using data of the jar test experiments: turbidity, pH, alkalinity, and temperature, to predict the coagulant dose. The best GEP model gave very good results with a correlation coefficient (0.91) and a root mean square error of 1.8. Multi linear regression was used to be compared with the GEP results; it could not give good results due to the complex nonlinear relation of the process. Another round of experiments was done with high initial turbidity like the values that comes to the plant during floods and heavy rain .To give an equation for these extreme values, with studying the use of starch as a coagulant aid, the best GEP gave good results with a correlation coefficient of 0.92 and RMSE 5.1


Introduction:
The drinking water resources scarcity has been a serious issue for many decades 1 .Surface water is one of the main drinking water sources, which is usually unsafe to use without a treatment 2 .
There are different types of drinking water treatment plants, according to the characters of the resource water. The main objective of them all is to produce water that does not contain microorganisms or toxic compounds that are biologically and chemically safe for human consumption 3 .
Because of its ability to solubilize, pure water is not found in nature. It usually contains dissolved impurities like minerals, organic compounds and gases that alter the physical, chemical and biological characteristics of water 4 .
Biological, organic, and inorganic materials may be found in water. The majority of impurities are colloids. The colloids could be classified as hydrophilic such as soap soluble starch, synthetic detergents, and hydrophobic such as clay particles, metal oxides. A hypothetical molecular structure of humic acid, is shown in Fig. 1. The behavior of colloids is controlled by principal phenomena like electrostatic forces, van der Waals forces, and Brownian motion. Most of these collides have a negative charge and their colloidal dispersions are stabilized due to electrostatic repulsion, which prevents particle aggregation and overcomes Vander Waals' forces 5 . These collides could not be removed by normal filtration or precipitation processes. To remove these impurities from water, conventional treatment is frequently used. It is a combination of the following steps: Coagulation, Flocculation, Sedimentation and Filtration.
Coagulation and flocculation are the main and important processes of water purification. Coagulation is the process of increasing the sizes of particles by using materials called coagulants that can standardize these particles and help to make them bigger and more able to be settled 7 .
The history of coagulation is nearly three thousand years old. There are so many different coagulants in which alum is the most used. Alum is a coagulant that is widely used in water treatment plants to remove turbidity and reduce natural organic matter.
Earlier research suggested several mechanisms for particles destabilization of aluminum during coagulation. These mechanisms are double layer compression, neutralization, adsorption, and sweep flocculation. Due to the complexity of the coagulation process, these mechanisms may exist either by themselves or they may exist in combination 8 .
When alum is added to water containing alkalinity, the following reaction occurs: After adding alum to the water, it dissociates immediately, resulting in aluminum ion surrounded by six water molecules. Then the aluminum ion starts reacting with the water, forming large complexes.
Regardless of the species formed, the complexes are massive precipitates that enmeshed numerous colloids, removing them through entanglement 9 The turbidity levels vary during the year, as it may increase in the wet seasons, due to the heavy rain and soil erosion 10,11 .This means the need for high levels of alum, but the excess use of aluminum sulphate "alum" can leave high aluminum residuals concentrations in the treated water. Such residuals should be controlled and minimized because they can cause problems in the distribution system. High aluminum concentrations in treated water are associated with many problems, like the formation of aluminum precipitates that lead to increasing turbidity. In addition, it has been suggested as one of the main factors in Alzheimer's disease . 1 Modeling water treatment plants is very important in the processing, designing and operation time of the treatment units. Models are valuable tools. Some plants introduce a fully automated operation. In these types of plants, good predictive models are needed that can provide online data about water quality, chemical dose, and filter operation time 5 .
In recent years, modelling and optimization have become increasingly important in most fields. Optimization helps with gaining a better understanding of the system. Earlier, traditional modelling has been used to describe biological processes; it has been made by giving equations related to the speed growth of microorganisms, the consumption of substrate, and product formation. Such models have many limitations according to the fact that reactions are non-linear and time dependent 12 .
Modelling complex processes such as those in water treatment plants is not an easy thing, due to the nonlinear processes happening 13 . Recently too many efforts in modeling realworld problems have been made by using the artificial intelligence (AI) tools available. One of the most promising tools is gene expression programing, which is still young in environmental science applications. It could provide a nonlinear equation, that is good accurate, and easy to be used for giving the optimal dose of alum. Previous studies 14,15,16,1,12 used AI techniques in drinking water modelling are described in Table.1 with the method used in modelling and the predicted values (the output of the Models).
In this study, a gene expression method was used, to model the optimal dose of alum in drinking water treatment plants. By modelling the results of different sets of experiments, studying the effect of the following parameters on the optimal dose, Turbidity, pH, the temperature and the alkalinity. The raw water turbidity varied between 6.89 and 60.35 during the time of the study, it is described in Fig.2

Figure 2. Turbidity interring the plant variation during the time of the study
As it is shown in Fig.2, the turbidity raises to about 40 NTU during the rainy seasons, with some extreme values near 60 NTU, and it is between 5 and 20 during the other seasons.

Jar Test Experiments:
A series of chemical experimental has been conducted using jar test experiments. The study focused on four aspects, the difference in the raw water turbidity, pH, temperature and alkalinity on the suitable alum dose. The next steps describe the way that each experiment had been conducted: 1-First of all, a Jar Test was performed for each initial turbidity level, (from 5 NTU to 60NTU ± 3) with a step of 5 NTU every time; 65 experiments were carried out overall. The alum dose was changed between 5 and 30 mg/L. The turbidity values were as actual values as possible, and not a synthetic sample, because turbidity have so many different resources, and to avoid the mistake that would happen because of the wrong samples. One level of turbidity does not occur during the time of the study, so synthetic samples were made, using kaolinite clay AL 2 Si 2 O 5 (OH) 4 . At this part of the experiments, the pH was about 7.5 ± 0.4, T = 20 ±2℃ , alkalinity = 140 ± 5. Fig. 3 shows the jar test apparatus used in the study. , and the dose varied from 5 to 20 for the initial turbidity 10 , and from 10 to 30 with a step of 10 mg/L for other turbidity values . pH was changed between 6 to 8.5 for each set of data for turbidity and alum, to study the effect of pH on changing the suitable dose. Then step was 2 repeated. 4-The temperature in the study area differs from one month to another during the year. Therefore, a set of experiments were studied for determining the optimal dose in each specific temperature. This collection of experiments were done at different times of the year, the temperature of raw water was (9, 14, 18, 23 ±1), initial turbidity were (10, 20, 30, 40 ± 3) and the dose changed between 5 to 30 mg/L with a step of 5 mg/L. Then step 2 was repeated. 5-To evaluate the effect of the alkalinity changes, three different alkalinity levels were studied,140 , 160 , 180 ± 5 , at six levels of turbidity ( 10 , 20, 30, 40, 50,60 ) , pH and temperature were constant, with different doses of alum. Then step 2 was repeated. Table.3 summarizes the steps of the experiments. After finishing the Jar Tests for each run in each set, the parameters for coagulation and flocculation were measured again (pH, turbidity, and aluminum residual) in order to determine the coagulant dosage that caused the highest percentage of turbidity removal, the optimal dose.
Multi Linear Regression: known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. MLR models are suitable to simulate relationship between many variables in different fields, based on least -squares fitting. It can determine the relationship between two or more parameters. As expressed in the following equation, where x is multiple components, y is the output : explanatory variables, 0 : constant , :slope coefficient for each explanatory variable Gene Expression: It is one of the AI models; GEP is a type of genetic algorithm that was first proposed in response to Darwin's theory of evolution. Which was proposed by Ferreira et al in 2001. It operates in the same way that a group abandons undesirable members and creates genetically engineered offspring in evolution. At the start of this method's process, no practical relationship is taken into account . 15 GP's basic search strategy is a genetic algorithm, but it differs from standard GA in that it usually works with parse trees rather than bit strings. A terminal set (the problem's variables) and a function set are used to build a parse. Moreover, the GEP can solve problems in different fields with high performance. This method has been applied recently to recognize the manner of nonlinear systems. The first step of GEP operation, is determining the fitness function, which can be determined mathematically as the following equation: Hence, M is the range of selection, , is the value returned by the individual chromosome i for fitness case j (out of Ct fitness cases) , is the target value for fitness case j 10 Figure 4 shows the expression tree (ET) for a mathematical expression of the following example (a + a × b) -(a + √ ).  5 shows the steps of GEP models, its steps, first the randomly creation the initial population generation. Then, the chromosomes are expressed and excluded the tree expression for fitness evaluation. The individual is then selected according to their fitness to reproduce with the modification; these individuals are subject to the same development. This process is going in a repetition loop several times until a good solution is found.

Results and Discussions:
I-1-First set of experiments, the turbidity changed between 5 and 60 ∓ 3 , and the alum dose was between 5 and 30, pH was set about 7.5 ∓0.2 , and temperature at 20 ∓ 02 , Alkalinity 140 ∓ 3.

Figure 5. Flowchart of Gene Expression Programming.
After the rapid mix, slow mix, and precipitation. At the turbidity 30 NTU, both doses 15 and 20 mg/L were able to give acceptable final turbidity, but when the dose increased to 25 mg/L, the residual increased. It is obvious from results of the experiments that the percentage of turbidity removal was increasing with the increase of the alum dose until a certain value. Then it becomes useless to continue increasing the dose. Moreover, the percentage of removing decreased. Because the overdose causes charge reversal "destabilization". To form a precipitate that will attract suspensions at low concentrations of turbidity, an excessive amount of coagulant must be added, whereas at higher concentrations, coagulation occurs at a lower chemical dose because the suspensions provide the nuclei that aid sedimentation. The results are illustrated in Fig. 6.

. Residual turbidity for different pH level at different initial turbidity
At set 1 the optimal dose for initial turbidity 30 was 15 mg/L, whereas, with the drop of pH into 6.5 in set 2, the required dose for reaching an acceptable level of residual dropped to 10 mg/L. Likewise, for initial turbidity 10 NTU, when pH changed from 6 to 8 for the same dose, the effectivity varied from 66% to 32%.
When the pH value of the treated water is reduced, the coagulation efficiency becomes greater and the residual turbidity is lower because the coagulant pH value decreases, and this causes a lower required dose at many cases. At higher pH, the optimum alum dosage increases due to the decreased positive charge of the adsorbed species. 3-The third set of experiments, were to study the effect of the temperature, as temperature is an important parameter for water treatment. At this set of experiments, pH, and alkalinity were constant as possible at, 7.5 ± 0.4, 140 ± 5. The experiments were done at different times of the year, and the temperature were 9, 14, 18, 23 ±1, studying the doses (5, 10, 15, 20, 25, 30) mg/L. The results are described in both Table.5 and Fig. 8. The optimal dose has increased with the temperature decrease, which means they need more consumption of alum during the cold seasons. The optimal dose raised for raw water turbidity 10 NTU from 10 to 20 when the temperature dropped from 23 to 9 and from 15 to 25 when the temperature for the raw water was 40 NTU, the alum dose raised about 0.7 mg/L for each 1 ℃ drop in the temperature. Low temperature affects coagulation and flocculation processes by altering coagulant solubility, increasing water viscosity and retarding the kinetics of hydrolysis reactions Higher coagulant dosage, the addition of flocculation aids, longer flocculation and sedimentation times; are required at lower temperatures.

959
The percentage of removing turbidity had dropped from 75% to 40 % when the alkalinity increased from 140 to 180 for the initial turbidity 10 NTU. The same thing happened for the other initial turbidity levels, with different percentages. Low water turbidity was more affected by the difference of alkalinity than the high ones. The optimal dose from each run in every set was used to build up a model; two methods were used m MLR and GEP. The data used for the model shown in Fig.10  The errors histogram is shown in Fig.11.

Gene Expression Models:
First, 70% of the data were used to train the model. The remaining data were used for validation. Fitness function used to evaluate was RMSE. There are various of parameters related to the Gene expression models; the most important ones are the Number of chromosomes, mutation and functions. In this study two different numbers of chromosomes, and mutation rates were used. The Values of GEP Parameters are shown in Table. 6.  GEP 2-2 gave the best results, as the increase in the number of chromosomes helps to improve the model most of the times, mutation rate was set as 0.00138, increasing it does not give a bitter solution in this case.
The same functions, function fitness, and linking function were used in the four models. The results for the best model are shown in Table.7.

Table 7. The accuracy table of gene-expression models.
RMSE in training was higher than in the testing phase. RMSE, RAE, MAE, and RRSE values were greater in training relative to testing. As the error increases, the performance of the model declines.

961
The constants and the parameter used in the equation showed in Fig. 12 are listed in Table.8.  Figure 13 represents a Comparing chart of part of the data with the results gained from the 2 models, MLR and GEP, as shown the GEP model was way ahead, and it can be used as a trusted modeling method in this case.  Then the pH was adjusted between 6.5 and 8.5. The results are represented in Fig.14 pH has more effect on the optimal dose for the low concentrations than the high ones. When the pH changed from 6.5 to 8.5. The removal of the optimal dose gained from set 1, changed from 96 to 89.2, for the initial turbidity 75 NTU; and from 98.8 to 96.6 % for the initial turbidity 200 NTU. The last factor tested was the temperature; it was obvious that the temperature has a significant effect on the optimal dose. The effectivity of the same dose, dropped between 5 to 10 % when the temperature dropped about 15℃. The differences in the affectivity of the optimal dose for each initial turbidity, in different temperatures is shown in Fig.15.

Figure 15. Turbidity removal at different temperatures for high turbidity concentrations
An addition set of experiments was added, to determine the required dose of starch when used as a coagulant aid, so the optimal dose and a much lower dose were tested for initial turbidity (100, As a result of the experiments, it was obvious that an appropriate dose of starch can increase the required alum dose, and raises the affectivity of the optimal dose. The residual turbidity and the affectivity of removal are shown in Fig.16.
It was observed that the higher the starch dose is, the higher COD levels in the final water, so it is not recommended to use much higher concentrations. -A nonlinear model was achieved using GEP with the same method used for the low turbidity levels, and the gained model were as the following equation: Alum Dose =

Figure 16. Residual Turbidity when using different starch doses as coagulant aid
And for the ease of use from the operators, the equation was made as a GUI inference was made, so the use of the equation becomes much easier. The inference is shown in Fig.17.

Conclusion:
Modelling the drinking water treatment plants is a very complicated process, and the most important thing is to determine the optimal dose of alum, when it is used as a coagulant, because it causes a lot of problems when it is not optimal. Like the reduction of pH of the water and the increasing of aluminum residuals.
Many articles have studied different modelling methods to predict coagulant dose, using artificial intelligence models, like ANN, and Fuzzy logic, but there is no specific equation to determine the alum dose, which would be much easier to use from the operators of the plants.
GEP is a promising tool for modeling treatment plants, and it is being used in recent studies in the environmental sciences. GEP gave a good predicting equation, that is flexible and able to model such complex relations with an accuracy of R =0.91. The most affecting parameter on the GEP model, initial turbidity and temperature, with importance of 44.35 % and 28.03 %.
MLR could not give a good result. Whereas; GEP was a good and reliable method to determine the alum dose needed, in different circumstances.
At the rainy seasons, high turbidity levels occur, which is an extreme level, so it had to be studied individually. In order to build a specific model, another set of experiments were done and a good resulted model was gained using GEP, with the accuracy of R=0.9, RMSE= 5.2. The study of using starch was used, to avoid consuming high concentrations of alum, it gave good results in reducing the needed dose, but it should be noticed that it affect the COD continent in the treated water.