Determination of Optimal Time-Average Wind Speed Data in the Southern Part of Malaysia

: Mersing is one of the places that have the potential for wind power development in Malaysia. Researchers often suggest it as an ideal place for generating electricity from wind power. However, before a location is chosen, several factors need to be considered. By analyzing the location ahead of time, resource waste can be avoided and maximum profitability to various parties can be realized. For this study, the focus is to identify the distribution of the wind speed of Mersing and to determine the optimal average of wind speed. This study is critical because the wind speed data for any region has its distribution. It changes daily and by season. Moreover, no determination has been made regarding selecting the average wind speed used for wind studies. The wind speed data is averaged to 1, 10, 30, and 60 minutes and used to find the optimal wind speed average. This study used Kolmogorov-Smirnov and Chi-Square as the goodness of fit. The finding shows that the wind speed distribution in Mersing varies according to the time average used and the best fit distribution is Gen. Gamma. In contrast, the optimal average wind speed is 10 minutes due to the highest similarity results with 1-minute data. These affect the reliability of the finding, accuracy of the estimation and decisions made. Therefore, the implementation of this study is significant so that the wind distribution in a particular area is more accurate.


Introduction:
Renewable energy is something familiar to us today and this energy is produced by nature and never runs out. In addition, it also produces only minimal secondary pollution. Among the key factors that cause renewable energy to be popular are the world oil crisis and the emission of carbon dioxide gas, which results in global warming. In order to limit the impacts of climate change, many countries are very keen to develop renewable energy. Malaysia participates in renewable energy as it does not want to be left behind by other countries. According to 1,2 , Malaysia has embarked on the first step in searching and using renewable energy since the Eighth Malaysia Plan (RMK8). It cannot be denied that this renewable energy has some advantages, including a clean, virtually no pollution and cost-effective energy source.
Various types of renewable energy have been identified and implemented, including solar and wind energy. According to 2,3 , it can be observed through the speed in technology development for both types of energy. However, wind power is a preferable choice for renewable energy 1,4,5 . The wind has been used since ancient times (2800 BC), generally for sailing ships in the oceans and the agricultural sector 6 . Nevertheless, nowadays, it can also generate electricity.
From the Asian region, India and China are the pioneers in this wind energy field 6 . Given the success of India, which is also close to the equatorial line, Malaysia could also be a good location for the implementation of wind power 7 . Among the advantages is that Malaysia has a large amount of wind every year. Wind power is one of the renewable energies available in Malaysia. However, wind power implementation is still low compared to other Asian countries such as Indonesia and Thailand. This situation can be proved through a report issued by 8 . The use of wind power for the production of electricity is very high around the world. According to a report submitted by the International Renewable Energy Agency (IRENA) 2017, there has been a fourfold increase in the implementation of wind energy from 2007 to 2016.
Based on previous researcher's findings, wind speed is the most crucial parameter 9 . This parameter is due to it being a crucial factor in determining the success of a wind study 10 . It is known that the wind speed changes frequently and it is a type of energy that is not fixed 11 . Therefore, more research about wind energy needs to be done to ensure the use of wind power for energy production in Malaysia. According to previous studies, many factors need to be considered to enable wind power generation to become a reality in Malaysia. The factors include the wind speed at a different altitude, wind direction, and the distribution of the wind itself 1 . Some researchers are studying the selection of wind distribution for a particular area [12][13][14][15] . Through the findings of past studies in which each area has a different distribution of wind. Many wind data distributions were obtained from previous research findings, including Weibull, Rayleigh, Burr and Gamma.
Besides that, the average selection of wind speed to determine the wind distribution is essential. The literature studies show that some researchers use the 10 minutes 16,17 , 30 minutes 3 , one hour 18,19 and one day 2,20,21 as average wind data for their studies 22 . Moreover, no determination has been made regarding the average wind speed used for wind studies. However, recommendations from the previous study indicate that when the data is analyzed using a smaller time average, the results are much better 23 . This recommendation is because feature characteristics of the data are less affected. On the other hand, using the more extensive data average causes the results to be less accurate. This is inaccurate due to the small amount of information obtained when the data is averaged over a more extensive data span.
In a study involving each data set, the critical point is to determine data distribution 24 . This critical point is due to each data having its distribution. In this study, the process of determining the distribution of wind speed is fundamental. This process is essential because the wind speed prediction process will be more accurate by obtaining an appropriate distribution. After all, determining such a distribution becomes extremely difficult due to the wind's constant change. These changes result in the forecasting process becoming more complex. Then, to ensure that the distribution runs smoothly and accurately, it depends entirely on the form of data. Thus, the nature of the data distinguishes it. However, to date, no determination or standard of wind average can be used for this purpose to ensure that the distribution can be determined accurately. Therefore the primary purpose of this study is to identify the distribution of the wind speed of Mersing and determine the optimal selection of average wind speed.

Materials:
Selection of the Mersing, located in the southern part of Peninsular Malaysia as a suitable area for power generation from wind power, began as early as 1995 25 . The selection of the study area is based on several factors. Among the factors is the location itself, which is higher than the other places in Malaysia. Its location is 43.6m above the mean sea level 2 . In addition, the geographical area, which lies opposite the South China Sea, allows to experience a large amount of wind throughout the year. Thus, Mersing encounters both the sea breeze and land breeze and is affected by the monsoon seasons.
Furthermore, previous studies have suggested Mersing as one of the ideal places for generating electricity from wind power 1,2,25,26 . This wind power capacity research was conducted utilising data from 2007 to 2013 and found that the wind power density in Mersing is approximately 14-25 2 ⁄ 27 . The data was obtained from a weather station installed at the University Kebangsaan Malaysia (UKM) -Mersing Marine Ecosystem Center (EKOMAR) (Fig. 1). In this study, wind speed data measured and recorded at 20 m height from the ground was used for the analysis. The data used is wind data from May 2017 to November 2017. The average humidity and temperature for the study area are shown in Table 1. Additionally, it can be utilised to give a general idea of the location's various features and characteristics. With this information in hand, it is reasonable to conclude that the study location has a moderate temperature and a high humidity level that remain constant over the length of the research (May 2017 to November 2017). Aside from that, the excessive humidity is a result of an abundance of rain.

Methods of Work:
Wind data is divided into several average groups, including 1, 10, 30 and 60 minutes averages. The data went through a quality control process and missing data was not taken into account as analysis. The data were analyzed using the software, EasyFit 5.6 Professional. This software was used to obtain the corresponding distribution of the wind data obtained. In addition, R software was used to obtain descriptive values for the data. Statistical data is required for the comparison of the different group averages. The average value of wind speed, standard deviation and ultimately data skewness was analyzed.

Estimation Distribution
For this study, the Maximum Likelihood Method (MLM) was used to determine the parameter's value and determine the distribution. This method has proven to be one of the best estimating methods in predicting parameters in various locations in the world [28][29][30][31] . This method was proposed by Steven & Smulders in 1979. It involved the most numerical iterations in computing the value of the parameter 29 . For this calculation, let , ,….. be a random sample size of drawn from a probability density function ( , ) where an unknown parameter is. The likelihood function, The Maximum Likelihood Estimator (MLE) of is the value of that maximizes or, equivalently, the logarithm of . According to 32 , often but not always, the MLE of is a solution of = 0 2 The value of the parameters obtained substitute in the distribution that is deemed appropriate to obtain the probability density function (pdf) value. In contrast, the pdf be plotted together with the histogram of the data. This analysis aims to observe the best fit distribution that corresponds to the histogram.

Statistical Tools (Goodness of Fit)
The selection of the best distribution also involves statistical analysis. The reason to use a statistical tool is to see the effectiveness of each selected distribution. Based on the literature, at least two statistical tools need to be used as analysis. Different statistical tools provide different results 33 . In this study, Chi-Square (χ 2 ) and Kolmogorov-Smirnov (KS) are used as the goodness of fit (GOF): The formula for Chi-Square is, where is the observed data and is the expected data 33 .
identifies the set of the velocity to be considered, ( ) is the cumulative probability distribution for specific distribution and ( ) is the experimental histogram.

Results:
Wind speed descriptive analysis is a critical issue that researchers must address. Descriptive analysis can provide a comprehensive picture of the characteristics of data. The importance of descriptive analysis can be seen based on many frequencies of its use for any study. Table 2 summarises descriptive statistics for this study, including mean, standard deviation, and skewness. Mean is an essential element in assessing wind energy in a location 34 . The mean for each average is almost identical. For example, the highest mean wind speed was recorded in July (2.1634 m/s), while the lowest was in October (1.3970 m/s). Based on Table 2, standard deviation values decrease when the averages wind speed over increases. Apart from that, the value of standard deviation for this study is relatively large.
On the other hand, the mean value of the data obtained is not suitable for assessment. This is because the large standard deviation value causes the mean value to be unstable. However, since this value was not used to achieve the study's primary objective, it was noted and used as additional information. As a result, the data is skewed to the right due to positive values for the entire data. Furthermore, it is observed that the larger of average value used, the lower the value of skew is.  Table 3, the initial conclusion that can be made is that the selection of average wind speed influences the distribution of Mersing, Malaysia. Besides that, there is no single data range that resembles the Weibull distribution. Weibull is a famous distribution that has been used for decades, especially for wind speed research 35 . For July, it is essential to note that the average selection of either 1 minute, 10 minutes, 30 minutes or 60 minutes does not affect the distribution (Table 3).
For May, June and November, the results were almost the same. Selection of average 1, 10 and 30 minutes give the same distribution result while the result is not the same for the average of 60 minutes. Whereas for September and October, the results were roughly the same. Results for the average selection of 1 and 10 minutes are the same. However, for August it shows pretty differently compared to the others month. This difference is due to the distribution results are constantly alternating between the average selections used.
Based on Table 3, there are 28 analyses conducted. The analysis refers to the frequency of obtaining the same wind distribution between 1 minute by 10 minutes, 1 minute by 30 minutes, and 1 minute by 60 minutes. For the average data 1 minute by 10 minutes, the per cent of similarity is In conclusion, these analyses show that in addition to using a 1-minute average, the researchers can also use 10 minutes average data. This use is due to the highest similarity compared to others' averages. Apart from that, this study shows 60-minute mean selection is less accurate. Possibly this is because of the overwhelming amount of missing information when an average of 60 minutes is used.  c.
d. Table 4 indicates the best fit distribution by Kolmogorov-Smirnov. It is initially concluded that having more parameters does not guarantee the best fit for any data. However, Table 4 shows that the distribution with a few parameters is better than those with many parameters.

Figure 2. Example of best-fit distribution for wind speed in May by Chi-Square (a) Inv. Gauss (3) (b) Inv. Gauss (3) (c) Inv. Gauss (3) (d) Gamma (3) Statistical Tools-Kolmogorov-Smirnov (KS)
For May, the results obtained for 1-minute averages are equal to the result for an average of 10 minutes. While for an average of 30 minutes, it has the same result with an average of 60 minutes. The results for July through October have the same pattern. For July, it has the distribution that has two parameters only, which is Nakagami. In standard practice 15,36,37 , Weibull distribution is the most likely distribution for wind power studies. However, there is only one Weibull distribution from 28 analyses conducted. Furthermore, it only takes place in August. Table 4 was also used to calculate the frequency of obtaining the same distribution for average data of 1 minute by 10 minutes, 1 minute by 30 minutes, and 1 minute by 60 minutes. For the average 1 minute by 10 minutes data, the similarity of the frequency is 57.14%. While for data 1 minute by 30 minutes, the result was 42.86% and for the data, 1 minute by 60 minutes gave a 28.57% similarity.
Additionally, these results also support the conclusion from Table 3. It shows that the 60 minutes average selection is less accurate. This inaccurate is due to the missing information when an average of 60 minutes is used.
Nevertheless, the comparison of the findings between Chi-square ( 2 ) and Kolmogorov Smirnov (KS) shows that the percentage of similarity for the average usage 10 minutes and 30 minutes is decreasing except for the average usage of 60 minutes where the percentage is increasing. A sample histogram by probability density function (pdf) for the selected month (October) is shown in Fig. 3 (a-d). In this case, the selected month is based on average use (1 minute, 10 minute, 30 minute, and 60-minute intervals), demonstrating the variability of the wind speed distribution data. Figure 3 indicates that the distribution for each type of average used is derived from the same distribution family as the finding in Fig. 2. The similarity is noticeable in the rightskewed distribution. Nonetheless, each distribution differs in terms of the width of each mode class.

Discussion:
There are seven analyses for each goodness of fit (GOF) and each average. Since there are 2 GOF used in this study which is KS and χ 2 , then for this study, there are 14 analyzes in total for each average. Based on Fig. 4, analyses were conducted for an average of 1 minute to find the best distribution. It shows three distributions that have the best fit for the 1-minute data. It can be concluded that Gen. Gamma distribution exhibits the best fit since it gets the highest percentage. It explains 64% of the data. While for Gamma and Inv. Gauss gets 22% and 14%, respectively. Five different distributions were observed for the average use of 10 minutes of data (Fig. 5). Gen. Gamma distribution accounts for the large distribution for 10 minutes average data at 43%. While each of Gamma and Inv. Gauss explains 29% and 14%. As seen from previous findings (Fig. 4), all the distributions available for the average of 1 minute were also observed for the 10 minutes averages. This finding indicates a continuity of the distribution between using 1 minute or 10-minute average data. Therefore, to use a 1-minute average, we also can use 10 minutes average. These 10 minutes on average are also suggested by 22,38 . While for the last place, each of Nakagami and Burr distribution gets 6%.

Figure 5. List of distribution that best fits the 10 minutes data (Mersing)
In total, 14 analyses were conducted for an average of 30 minutes to find out the best distribution. According to the 30-minute average data (Fig. 6), the results look similar to the average data for 10 minutes (Fig. 5), five distributions, such as Gen. Gamma, Gamma, Inv. Gauss, Weibull and Burr distribution. The only difference found between 10 minutes and 30 minutes was the lowest percentage, Weibull distribution. While Gen. Gamma distribution has the best fit distribution at 43%. In second place is the Gamma distribution with a value of 29%, while the Inv. Gauss occupies third place with 14%. Meanwhile, the last position is represented by the Burr and Weibull distribution with a value of 7% each. Fig. 6 illustrates a conclusion that it is a continuation of the previous results, Fig. 4 and Fig. 5. This continuation can be seen from the best three distributions available for the average of 1 minute and 10 minutes also it can be seen on average 30 minutes, Gen. Gamma, Gamma, and Inv. Gauss.

Figure 6. List of distribution that best fits the 30 minutes data (Mersing)
For the last average (60 minutes), then in total, there are 14 analyses as well. There were conducted to find out the best distribution for an average of 60 minutes. There are six different distributions that best fit the 60 minutes data, which are Gen. Gamma, Gamma, Inv. Gaussian, Normal, Burr and Rayleigh (Fig. 7). The best fit distribution is Gen. Gamma distribution since it has gained the highest percentage, 36%. In the second place with 22%, Gamma distributions while Rayleigh and Burr's distributions share in the third place with 14% each. Furthermore, the Normal and Inv. Gaussian distributions occupy the last place with a percentage of 7%. In conclusion, the best two distributions available for the average of 1 minute, 10 minutes and 30 minutes also is an average of 60 minutes, which are Gen. Gamma and Gamma distribution.  As a conclusion (Fig. 8), the best fit distribution for each type of average wind speed is the Gen. Gamma distribution. This is because it best represents the wind speed in Mersing. The percentage for Gen. Gamma distribution is 46.43%. Gamma and Inv. Gauss distributions come in second and third, with 25% and 12.5%. The Burr distribution has been chosen for four-time averages, which is equivalent to half of the Rayleigh distribution. In contrast, the last place represented by Nakagami, Normal and Weibull distribution receive only 1.8% respectively.

Conclusion:
This paper discusses the best fit distribution of wind speed for Mersing and the selection of optimal time average used for a wind speed study. In conclusion, the best distribution for Mersing is Gen. Gamma. In addition, this study also found that the selection of the type of average wind speed affects the distribution. Furthermore, the findings show the continuity of the best distribution based on the type of average used. Based on the finding, the optimal average wind speed other than 1 minute is 10 minutes. This optimal wind speed is due to both GOFs, which show the highest similarity compared to the other averages. The first GOF ( 2 ) shows a similarity of 85.71%, while the second GOF (KS) shows 57.14%. This study needs to be continued to determine the optimal average wind speed that is suitable for assessment purposes. The selection of distribution for each data is a crucial requirement before proceeding to further analysis. If the selection of data is made incorrectly, it causes the overall findings to be inaccurate. Therefore, excellent optimal average data is used to reflect the best distribution selection process. Lastly, it helps researchers determine the approximate parameter and aid in better predictions.