Advances and Applications in Statistics
Volume 59, Issue 1, Pages 17 - 42
(November 2019) http://dx.doi.org/10.17654/AS059010017 |
|
THE EFFECTS OF RESAMPLING METHODS ON LINEAR DISCRIMINANT ANALYSIS FOR DATA SET WITH TWO IMBALANCED GROUPS: AN EMPIRICAL EVIDENCE
Hakiim J. and Nor Idayu Mahat
|
Abstract: Linear discriminant analysis (LDA) like other conventional classification algorithms is biased towards the majority group when dealing with group imbalance. Resampling methods such as random oversampling (ROS) and random undersampling (RUS) are used to alleviate the problem. However, previous studies could not agree on the significance of both ROS and RUS to improve the performance of LDA. This paper addresses the disagreements by analysing the significance of ROS and RUS on LDA through k-fold cross-validation based on true positive rate (TPR) and true negative rate (TNR). Both the performance measures could assess the classification of each object more precisely as compared to error rate (ER) or area under receiver operating characteristics (ROC) curve (AUC). Researchers utilised 100 simulated data sets and four real data sets. This study revealed that (i) resampling method of ROS or RUS improves the performance of LDA in classifying the objects especially from minority group, and (ii) LDA is significantly biased towards objects from the majority group. |
Keywords and phrases: linear discriminant analysis, resampling methods, group imbalance.
|
|
Number of Downloads: 327 | Number of Views: 1103 |
|