•  
  •  
 

Abstract

Feature reduction techniques are fundamental to enhancing machine learning (ML) algorithms by reducing the number of features in a dataset. The study here explores the impact of Principle Component Analysis (PCA) on ML algorithms within an unbalanced classification framework, in partnership with feature selection techniques like Cluster Variation Attribute Evaluator (CVAE) and Correlation Attribute Evaluator (CAE). In addition, the research introspects a comparison analysis evaluating the effectiveness of several ML methods, including Multilayer Perceptron (MLP), Decision Tree J48, k-Nearest Neighbor (k-NN) and Sequential Minimal Optimization (SMO). The informative analysis of results signifies that the MLP technique with PCA minimized the build time of the model about 50%, in the case of 5 folds' cross validation, whilst accuracy remained at the same levels. In contrast, the J48 technique clarified a weak response to feature reduction techniques, while CVAE had a negative impact on the performance of all models. Furthermore, applying PCA with SMO promoted diagnostic accuracy from 95.56% to 95.82%. The k-NN approach realized an accuracy increase to 92.42% with PCA, up from 91.12%, and CAE notably improved the model's accuracy. As a substantial point, this research employed the Weighted Average of Precision, Recall, and F-Measure to deliver a comprehensive assessment of model performance on an imbalanced dataset. The nominal-type thyroid dataset was utilized as the case study for this research.

Keywords

Correlation attribute evaluator, Clustering variation attribute evaluator, J48, k-NN, MLP, PCA, SMO, Nominal thyroid dataset, Weighted average

Subject Area

Computer Science

Article Type

Article

First Page

1333

Last Page

1351

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

 
COinS