Building a Sustainable GARCH Model to Forecast Rubber Price: Modified Huber Weighting Function Approach

The Abstract The unstable and uncertain nature of natural rubber prices makes them highly volatile and prone to outliers, which can have a significant impact on both modeling and forecasting. To tackle this issue, the author recommends a hybrid model that combines the autoregressive (AR) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) models. The model utilizes the Huber weighting function to ensure the forecast value of rubber prices remains sustainable even in the presence of outliers. The study aims to develop a sustainable model and forecast daily prices for a 12-day period by analyzing 2683 daily price data from Standard Malaysian Rubber Grade 20 (SMR 20) in Malaysia. The analysis incorporates two dispersion measurements (IQR/3 and Sn) and three levels of IO contamination 0%, 10%, and 20%. The results indicate that using the Huber weighting function with the IQR/3 measurement to build the AR(1)-GARCH(2,1) model leads to better sustainability. These findings have the potential to enhance the GARCH model by modifying the weighting function of the M-estimator.


Introduction
Forecasting volatility is crucial in the agricultural sector, mainly rubber.This may assist producers, traders and consumers in predicting future rubber prices.In recent years, there has been an increasing interest in the forecasting of natural rubber studies such as forecasting performance [1][2][3] and forecasting price [4][5][6][7] .In Malaysia's context, the Standard Malaysian Rubber Grade 20 (SMR 20) has chosen by researchers as empirical data in their research.However, the Malaysian natural rubber price fluctuates as a result of the world economy's decline 8 .These fluctuations can influence the volatility model as well as efficiency forecasting with the current volatility clustering.
Most researchers have used a time-varying volatility model such as the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model to obtain reliable forecasts 1,9 .Although the GARCH model is very general, there are serious challenges, especially when there are outliers.Previous studies have found that outliers can have detrimental effects on parameter estimate [10][11][12] , identification and estimation 13,14 and forecasting 13,15 .Therefore, robust methods are more preferred by researchers to reduce the influence of outliers.The well-known approach in robust methods is Mestimator.The M-estimator has been one of the most interesting research subjects due to its downweighting in minimizing residual function.The selection of the weighting function in the Mestimator will make model parameters less biased and forecast better performance during existing outliers 16 .Huber is the monotone weighting function that is most widely used in many areas [17][18][19][20] .Mathematically, the weighting function is dependent on standardized residuals with median absolute deviation (MAD) as a measure of dispersion.In contrast, 21 stated that MAD has two main flaws: low Gaussian performance (37%) and dependence on symmetric distributions.As a result, some robust dispersion, such as interquartile range, Qn and Sn, have been proposed for heavy-tailed distributions.
A careful study of the literature reveals that the weighting function received little attention.A clear understanding of dispersion measurement in weighting functions is crucial to obtaining sustainable time series models, especially the GARCH model.The primary goal of this study was to build a sustainable GARCH model with applied two dispersion measurements (IQR/3 and Sn) in the Huber weighting function during the existence of an innovative outlier (IO).Besides, to forecast the 12day rubber price in 2021.
The following section is structured as follows.The AR() model, GARCH(, ℎ) model, IO, central tendency and dispersion measurements, Huber weighting function in M-estimator and evaluation performance describes briefly in the section methodology.The simulation result discusses in the section on simulation study.The result of rubber price with forecasting performs in section empirical results.The summary of this paper includes in the section conclusion.

Materials and Methods
The conditional mean of the stationary time series model is the autoregressive (AR) model.The AR model of order , AR() can be expressed as where  0 is a constant parameter with conditions  0 > 0,  1 ,  2 , … ,   ≥ 0 are the constraints with non-negative integer and   is the white noise,   ~WN(0,  2 ), where WN is the white noise that's independent and identically distributed with a mean of zero.Eq.1 is applicable only when the variance for the time series is constant.When the variance in time series data is non-constant, the generalized autoregressive conditional heteroscedasticity (GARCH) model established by 22 is appropriate.Suppose that   =     , where {  } is a   ~N(0,1).The symmetric GARCH(, ℎ) model can be expressed as 2 where  0 is the constant parameter with conditions  0 > 0,  1 , … ,   ≥ 0 and  1 , … ,  ℎ ≥ 0. The GARCH(, ℎ) model in Eq.2 specifies that the today's conditional variance depends on the first  past conditional variance and ℎ past squared innovations.Problems arise when there are outliers in the data.There are several types of outliers such as innovative outliers.

Innovative Outlier:
An innovative outlier (IO), also known as internal change, is a data point that has an impact on subsequent observations 23 .The impact of IO is more complex than the impact of other forms of outliers 24,25 .For a stationary time series, IO will create a transient effect, while for a non-stationary time series, IO will produce a permanent level transition 26 .
The dynamic pattern of the effect of IO outliers is represented as As mentioned by 27  To overcome the effect of an innovative outlier, researchers, for example, Huber, have suggested using different types of measures instead of regular mean and variance as the basis for modelling the time series.Such measures were next to be discussed.

Measure of Central Tendency and Dispersion:
There are two measures were considered in the Huber weight function: central tendency and dispersion.The median was selected as a central tendency due to robustness against outliers.The interquartile range (IQR) is a dispersion measurement expressed as the distance between the 75% percentile (Q 0.75 ) and the 25% percentile (Q 0.25 ) of the data 28 .The IQR can be defined as IQR = Q 0.75 − Q 0.25 6 This measure has a breakdown point of 25% 21,29 .The outcome of the simulation by 30 suggested that median and median absolute deviation (MAD) or IQR could be reasonable alternatives to mean and variance.Nevertheless, this paper used the IQR/3 which was suggested by Ghani and Rahim 31 .
The measures of Sn were proposed by 21 as an alternative to MAD.This measure was convenient in the heavy-tailed and skewed distributions.The explanation of Sn also includes in the 32 research.The Sn can be defined as S n = 1.1926med  {med  |  −   |} 7 where (med  ) is the inner median with ⌊( + 2) + 1⌋-th order statistic and (med  ) is the outer median with ⌊( 1 ⁄ ) + 2⌋-th order statistic.Huber in 33 has suggested a weight function to overcome the effect of heteroscedasticity in modelling time series data.

Huber Weight Function:
The Huber M-estimation 33 is a common robust estimation approach.The weight function, () in the M-estimator was used to reduce the effect of heteroscedasticity on the standard error of approximate coefficients.Huber is the well-known weight function in M-estimator.The weight function of Huber is defined as with 1.345 as the default scaling constant for Huber, which produces 95% asymptotic efficiency for the normal distribution,   and  term is the standardized residuals.
Generally, the standardized residual is formulated as where in the conventional approach, the  and  in Eq.4 represent mean and median absolute deviation (MAD), respectively.In this paper, the researchers have suggested a modification to the Huber weight function.

Modified Huber Weight Function:
Even if the conventional approach is accurate, it can cause problems with location and dispersion measurements.Therefore, the modification of mean and MAD in Eq.9 is made as and with median is the central tendency measurement, while IQR/3 and S n are two dispersion of measurements.For the building of a sustainable GARCH model, the performance dispersion measurement was more considered.

Performance Evaluation:
The efficiency of various AR()-GARCH(, ℎ) model specifications were compared using Akaike's Information Criteria (AIC) 34,35 with  is the value of the likelihood function evaluated at the parameter estimates,  is the number of parameters to be estimated, T is the number of total observations, T 1 is the initial observation,   2 is the actual conditional variance at time  and  ̂2 is the predicted conditional variance at time .The lower the AIC, MAE, MSE and RMSE values, the more accurate the dispersion measurement.

Simulation Study:
In this section, evidence on the dispersion measurement of the Huber weighting function in the different percentages of IO contamination is provided.Three types of IO percentage contamination will be examined: 0%, 10% and 20%.Seven sets of simulations of the AR(1)-GARCH(2,1) model with different time series lengths, T for T=200, 500 and 1000 are generated.The seven sets of simulations are summarized in Table 1 During this part, the AR(1)-GARCH(2,1) model used the t series package 37 and the fGarch package 38 in the R software version 3.6.3,which was developed by 39 .The general procedure of modifying Huber weight during IO contamination was conducted as follows: 1) The AR(1)-GARCH(2,1) model specified using garchSpec function with stipulated the true value of parameters:  0 = 0.0001622,  1 = 0.2225,  0 = 0.0000033,  1 = 0.3725,  2 = 0.4810 and  1 = 0.1388.
2) In the beginning, the GARCH process simulated 200 observations with a mean is 0 and a standard deviation is 1 using garchSim.3) About 10% of the sample size was contaminated as IO.The locations and magnitudes of IO are identified.The magnitude for each contaminated point was calculated by using normal distribution where the mean is 0 and the standard deviation is 16. 4) The modification of the Huber weight function with using IQR/3 as dispersion measurement was calculated to be 10% contamination of IO.
The new data was determined based on the modified weighting the Huber function that was given to the 10% contamination of IO. 5) The modification of Huber weight function with using Sn as dispersion measurement was calculated to be 10% contamination of IO.The new data was determined based on the modified weighting Huber function that was given to the 10% contamination of IO. 6) Steps 3 to 5 are then repeated with increased contamination of IO to 20%.7) The parameters of the AR(1)-GARCH (2,1)  model for three situations fitted using garchFit function in normal error distribution.8) The performance of the AR(1)-GARCH(2,1) model for three situations was evaluated.9) Steps 1 to 8 then are repeated for different time series lengths, T=500 and T=1000.All-time series lengths were carried out for 1000 trials.

Results and Discussion
Empirical (Real Data) Results: In this section, the daily price data for Standard Malaysian Rubber Grade 20 (SMR 20) are examined.These secondary data were obtained from the official website of the Malaysian Board of Rubber, which spans the period from 4 th January 2010 to 30 th December 2020.The empirical was executed using the t series package 36 and fGarch package 37 in R program version 3.6.3 38.Table 3 illustrates the descriptive statistics of the daily returns for SMR 20.In daily returns, the range is between -0.07929 and 0.07568.The expected returns showed that -0.000171 per day.In daily returns, excess kurtosis occurs, which is 3.2659 greater than the usual value of 3.This may clarify that the data includes heavier tails and distributes them as leptokurtic.The first step in time series data is to test the unit root.The R output is based on Phillips-Perron (PP) test 40 , Augmented Dickey-Fuller (ADF) test 41 and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test 42 are shown in Table 4.As Table 4 shows, rejecting the null hypothesis of a unit root for the PP test and ADF test at the 1% level of significance.This indicates that the series is stationary where there is no unit root.Meanwhile, the p-value for KPSS test failed to reject the null hypothesis of a unit root at 10% level of significance.This indicates that the series is stationary and there is no unit root.As all three tests show that the series is stationary and has no unit root, the following step is carried out.In this paper, the AR(1) model is used as the conditional mean.Table 5 provides the result of ARCH effect using Engle's Lagrange Multiplier (LM) test 43 .The p-value for the LM test indicates that the null hypothesis-that there was no ARCH effect-was rejected at the 5% level of significance.From Table 5, the result showed the presence of ARCH effect in daily returns SMR 20.This can be explained that heteroscedastic appearing in residuals.Therefore, the AR( 1) is required to be combined with the conditional variance, i.e. the GARCH model.Four models were produced from different conditional variance specifications in GARCH(, ℎ) models, where  and ℎ order were either 1 or 2. The four different specification models were compared using AIC criteria to determine the Baghdad Science Journal best one.Table 6 shows a comparison of four models.The GARCH(2,1) model exhibits the lowest AIC value.The best in-sample part, according to Table 6, was the GARCH(2,1) model.To obtain the best fit between AR(1) and the four specifications GARCH(, ℎ) model, the AIC criteria in Table 7 were shown.The result in Table 7, showed that the AR(1)-GARCH(2,1) model reported the smallest value of AIC compared to the three types of specifications AR(1)-GARCH(, ℎ) models.For the out-of-sample part, the pattern of the 805 daily SMR 20 price can be seen in Fig. 2 (a).Because the LM test implied an ARCH effect in daily returns SMR 20, volatility clustering was evident in the returns presented in Fig. 2 (b).The overall pattern of the 2683 daily SMR 20 price and returns from 4 th January 2010 to 30 th December 2020 illustrates in Fig. 3 (a The bracket values under the coefficient for all equations show the t-statistics, which are statistically significant at the level of 0.1% (****), 1% (***) and 5% (**) and 10% (*).
To ensure that IQR/3 is more efficient than Sn, the seven models were compared based on performance evaluation.Table 9 shows the performance of the AR(1)-GARCH(2,1) model based on the percentage contamination of IO and dispersion statistics in the Huber weight function.The MAE, MSE, and RMSE values increased to 0.304604, 6.342343, and 2.518401, respectively, when the daily returns of SMR 20 were contaminated with 10% IO.As IO contamination reached 20%, the three measures increased as well (MAEmodel 5 = 0.634847, MSEmodel 5 = 10.88619,RMSEmodel 5 = 3.299422).This result showed that all three dispersion measurements in model 2 and model 5 were higher than in model 1.In the 20% contamination of IO, model 6 indicates a minimum value for all three measures.For MAE and RMSE, model 6 reported a minimum value of 0.006703 and 0.007401, respectively, which declined by 98.94 percent and 99.78 percent compared with model 5 in Table 9.Otherwise, model 7 declined by 98.16 percent and 99.57percent for MAE and RMSE, respectively.The MSE of model 6 and model 7 declined by 100 percent with 0.000055 and 0.000199, respectively.Even though Sn is wellknown for its reliable dispersion measurement and ease of calculation 21 our findings demonstrated that the IQR/3 is more efficient than Sn.It appears from Table 11 that the difference between actual and forecast prices (in model 1) of SMR 20 was 0.002, which ranged from 0.03115 percent to 0.03265 percent from 5 th to 19 th January 2021.When contaminated with 10% IO, model 3 and model 4 resulted in a forecast price increase by ranged 0.00467 percent to 0.0049 percent and 0.01716 percent to 0.01959 percent, respectively.As the level of IO contamination raises to 20%, the difference between models 6 to actual price was 0.0004.In the meantime, the forecast price of SMR 20 is increased by a range from 0.02122 percent to 0.02406 percent for model 7.According to the results in Table 11, the dispersion statistics of IQR/3 showed more efficiency than Sn during contamination with 10% and 20% IO.

Conclusion
The two dispersion measurements (IQR/3 and Sn) which were applied in the Huber weighting function for AR(1)-GARCH(2,1) model were reported in this paper.The following conclusions can be made: a) The AR(1)-GARCH(2,1) model using IQR/3 as a dispersion measurement in the Huber weighting function was found to be the sustainable GARCH model to forecast SMR 20 price.b) The forecast price of SMR 20 is more sustained when the IQR/3 measure is applied to Huber weight function during contamination 10% and 20% IO.
Therefore, the two measurements identified in the Huber weight function assist in our understanding of the role of suitable dispersion measurement in obtaining a sustainable model.Although Mestimator has some weighting, this work focuses on Huber weights in the M-estimator.Considerably further work would have to be undertaken to test the order type weighting function which can make the time series modelling and forecasting sustained with contaminate other types of outliers.

Author's Declaration
2024, 21(2): 0511-0523 https://doi.org/10.21123/bsj.2023.7489P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal 2024, 21(2): 0511-0523 https://doi.org/10.21123/bsj.2023.7489P-ISSN: 2078-8665 -E-ISSN: 2411-7986 Baghdad Science Journal As Table 2 shows, the dispersion statistics of IQR/3 and Sn affect the value of MAE, MSE and RMSE.The minimum MAE value during contaminated 10% IO was recorded by model 3 at 0.4888, which dropped by 55.87 percent compared to model 2. Model 4 reveals that the MAE has likewise decreased by 0.7614 (-31.26%).For the MSE, model 3 recorded a minimum value of 0.2835 as compared to model 4 in the 10% of IO contamination.Model 3 was the highest percentage reduction for the RMSE at 74.61 percent contrasting with model 4 (57.19%).In the 20% contamination of IO, model 6 indicates a minimum value for all three measures.For MAE and RMSE, model 6 reported a minimum value of 0.4956 and 0.5385, respectively, which declined by 56.97 percent and 74.98 percent compared with model 5. Otherwise, model 7 declined by 31.84 percent and 57.02 percent for MAE and RMSE, respectively.Model 6 and model 7 during contamination with 20% IO decreased the MSE to 0.29 and 0.8556, respectively; dropping by 93.74 percent and 81.53 percent.Although Sn is recognizable as a robust dispersion measurement and simplicity facilitates computation21 , however, our findings revealed that the IQR/3 showed more efficiency than Sn during contamination with 10% and 20% IO.

Fig. 1 (
Fig. 1 (a) exhibits a clear trend of 1878 daily observations price of the rubber SMR 20 (in RM per kilogram) in Malaysia from 4 th January 2010 to 11 th September 2017.When daily prices are transformed to log returns, the plot of daily rubber SMR 20 returns clearly shows volatility clustering, as seen in Fig. 1 (b).

Figure 2 .Table 8 .
Plot of daily (a) price and (b) returns on the SMR 20 for T=805.The performance of forecasting for various specifications AR(1)-GARCH(, ℎ) model is provided in Table 8.Based on the AIC criteria, the AR(1)-GARCH(2,1) model showed the lowest AIC value.The best model out of the three was determined to be AR(1)-GARCH(2,1).Selection criteria of AR(1)-GARCH(, )

Table 1 . Grouping of different contamination IO and dispersion measurements Model Contamination IO Dispersion measurements
:

Table 9 . Evaluation performance for model 1 to model 7 of the SMR 20 price
From the data in Table10, it is apparent that the four models affect the value of MAE, MSE and RMSE.The minimum MAE value in model 3 presents 0.005702, dropped by 98.13 percent compared with model 2 in Table9.While model 4 also dropped by 0.009675 (-96.82%).For the MSE, model 3 recorded a minimum value of 0.000039 as compared to model 4 (0.000138) in the 10% of IO contamination.Model 3 was the highest percentage reduction for the RMSE at 99.75 percent contrasting with model 4.

Table 11 . Results of daily actual and forecast price with a percentage change for selected models
The actual and forecast price represent in RM per kilogram.The value in the bracket represents percentage change.
We hereby confirm that all the Figures and Tables in the manuscript are ours.Furthermore, any Figures and images, that are not ours, have been included with the necessary permission for republication, which is attached to the manuscript.Ethical Clearance: The project was approved by the local ethical committee in Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu Darul Iman, Malaysia.