Proposing Robust LAD-Atan Penalty of Regression Model Estimation for High Dimensional Data

: The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator has superior performance compared with other estimators.


Introduction:
Variable selection is one of the issues that have been gaining popularity in recent years as many studies require dealing with high-dimensional data such as sonar, genetics, and others. As known, the presence of many covariates will lead to a very difficult process of model building, especially, in interpretation and with large variance. The variable selection is one of the most important problems in statistics as stated by Brad Efron and that was a single problem, which is merely variable selection associated with regression (1). That problem consists of choosing variables from a lot of candidate variables, estimating parameters for those variables, and then make the rest of inferences. The ordinary least square method and other traditional methods cannot deal with these problems. Penalized regression has been widely used with models including large explanatory variables that improve predictive accuracy as well as the selection of important variables in the model. This method is based on the minimization of the objective function which is formed of two parts: the first is a loss function, and the second is a penalty function based on the λ penalty parameter.
While the penalty function makes a tradeoff between bias and variance as ridge regression (2), the latest does not exclude undesirable variable because its coefficient is not set to zero.
Tibshirani (3) proposed the least absolute shrinkage and selection operator: LASSO method based on the principle of estimation and selection of variables in the simultaneous approach. LASSO has a fascinating advantage that it makes the value of some regression coefficients β which are nonsignificant to be zero. Detail discussion interested with LASSO and its consequences can be found in (4). Some generalizations of LASSO can be found in (5). Recent contributions have been made by Kim (6) and Uraibi (7). Fan and Li (8) suggest smoothly clipped absolute deviation: SCAD penalty function to keep the continuity of the penalty function at threshold points and to smooth it to be as the effect of quadratic splines at node points (λ) and (aλ(. Useful medical application has been introduced by Fang et al. (9). However, Fang et al. listed some other properties of SCAD penalized regression which can be summarized in the target function to be a highdimensional non-concave function, singular at the origin and does not have continuous second-order derivatives.
Zhang (10) suggested nearly unbiased variable selection under the minimax concave penalty (MCP) penalty function.
Wang et al. (11) is p-dimensional covariate, β = (β 1 , … , β p ), is an unknown parameter vector, p is the number of covariates, n is the size of sample, y = (y 1 , y 2 , … , y n ) ′ is (n x 1) response vector, and ϵ i is random error vector where (ϵ i ), (ϵ i ) = σ 2 . Then, the penalized least square method can be got from the solution of the following equation: min{ Where X = (x 1 , … , x n ) is (n×p) design matrix. ‖•‖ represents L 2 -norm and λ (•) is the penalty function that based on penalty parameter λ > 0.
There are many penalties used in penalized least square as SCAD penalty and has three merits in keeping continuity in data, unbiasedness in parameter estimator, and sparsity for small parameter estimators to put them as zero and defined by continuously differentiable function described as follow: Where assuming the value of ( = 3.7). Zhang (10) proposed concave penalty function called: (MCP) which is defined by: With λ ≥ 0 > 1. (12) proposed an arctangent type penalty which very closely resembles penalty call it Atan penalty which defined as follow:

… (5)
With λ ≥ 0 > 0. The first derivative of the Atan penalty defined as follow: Where the optimal value of ( = 0.005)

Robust LAD-Atan Regression:
Atan estimator is not robust, which means it is very sensitive to the presence of outlying observation, to deal with this problem, a robust loss function can be used to get the robust Atan method. In this research, it will be proposed the combining the Arctangent (Atan) penalty function with the LAD absolute loss function.
Atan estimator can be improved by combining it with a LAD part to get LAD-Atan estimator in a similar method to that used in illustrating LAD proposed by Wang et al. (13) and as follow: Where is the unit vector with the j th elements equal to one and all other equal to zero. Then the LAD-Atan estimator can be obtained by minimizing It can be made the benefit of (rq function in quantreg package of R) in (14) to get the LAD-Atan estimator by easy.

Theoretical Properties:
To identify the regression model correctly, the LAD-Atan estimator has to satisfy the following properties which are combined from (LAD) part as consistency, sparsity properties according to Wang et al. (13). Furthermore, (Atan) part satisfied regularity and oracle properties according to Wang and Zhu (12).
where λ j is some function in terms (n). ˆ to equal zero with probability tends to one.

 Regularity Conditions:
It is important to apply the following conditions on the proposed penalty: i.
, where is n is the sample size and p is the number of regression parameters. ii.
where Σ 0 = cov(x ia ), and f(t) is the probability distribution of ε i .

Selection of Penalty Parameter:
The process of selecting the penalty parameter is one of the important steps in the penalized regression, it controls the amount of transaction reduction and the selection of the sub-variables included in the final model. Tibshirani (3) and Fan and Li (8) used generalized cross validation (GCV) to select penalty parameters. In this research the suggestion to use generalized cross validation to select the penalty parameter is introduced through minimizing the following formula: Where df(λ) represent the degree of freedom, df(λ) = [ ( ′ + ) −1 ′] It represents the number of non-zero estimated parameters. A penalty parameter λ is selected that makes minimizing this GCV.

Simulation Experiments:
In this section, simulation experiments were used to show the performance of the proposed estimator. The proposed LAD-Atan estimator was compared with each of the following estimators: SCAD and MCP estimators based on (15). While the Atan penalty was based on (12). Simulation experiments were conducted by R program with package simFrame introduced by Koenker (16), and 200 replicates. Data generating were represented by the following linear regression model: Where = {3, 1.5, 0, 0, 2, 0, … , 0} Considering error distributions are standard normal distribution and t-distribution with three degrees of freedom, ~(0, ∑), with covariance matrix ∑ ij = ρ |i−j| , ρ = 0.5.
Measuring of performance to compression between the estimators via mean square error (MSE) criterion which represented as followed:    Table 1, the results of the simulation for standard normal error with no contamination, Atan and LAD-Atan estimators are the best because of their smallest MSE, FPR, and FNR respectively and for both sample sizes n = 60, 100. For vertical outliers, the LAD-Atan estimator shows superiority according to its smallest MSE, FPR, and FNR and for both sample sizes n= 60, 100.   Table 2, the results of the simulation for t-distribution error with no contamination shows that Atan and LAD-Atan estimators are the best results because of their smallest MSE, FPR, and FNR respectively. While for vertical outliers, the LAD-Atan estimator shows superiority results because of its smallest criteria: MSE, FPR, and FNR respectively.

Real Data Application:
Real data (n = 30) were collected to detect the chemical properties of the soil and studying their effect on date palm crops in Iraq. The application focuses on investigating the effect of 16 Table 3 above, it shows that SCAD method selected 4 variables (EC, CI, SO4, P), MCP method selected 4 variables (EC, K, CI, HCO3), Atan method selected 4 variables (PH, EC, CI, HCO3, P), and the proposed robust method (LAD-Atan) selected 3 variables (PH, HCO3,P) respectively.
To compare the estimators, the mean square error was used and the results were obtained as follow:   Table 4 it easy to notice that the LAD-Atan is the best estimator which combining estimation robustness (indicated by smaller MSE), in addition to being minimum selection variables (indicated by three variables only).

Conclusions:
Despite the fact that the LAD part is used in obtaining a robust estimator, it is not suitable to select the variable. Therefore, this problem is solved by combining the Atan penalty to the objective function. It is shown that from the simulation experiments, the LAD-Atan estimator gives the best performance for both estimation and variable selection rather than other methods. It can be noticed that LAD-Atan approves its robustness as long as increasing the contamination proportions for t-distribution, while it has good competition sometimes with Atan and MCP for normal distribution. Furthermore, the application results coincide with simulation results in showing the superiority of the proposed robust LAD-Atan estimator where are outliers in real data under consideration. The proposed method has more reduction in variable selection associated with more accuracy than the other three methods.