Advances and Applications in Statistics
Volume 62, Issue 2, Pages 185 - 201
(June 2020) http://dx.doi.org/10.17654/AS062020185 |
|
COMPARISON OF DIFFERENT SUPERVISED MACHINE LEARNING ALGORITHMS FOR THE PREDICTION OF TUBERCULOSIS MORTALITY
Apiradee Lim, Muhamad Rifki Taufik, Phattrawan Tongkumchum and Nurin Dureh
|
Abstract: Supervised machine learning (ML) algorithms are widely used in several areas for classification and prediction, especially with large datasets. However, it is important to investigate the application of ML algorithms to rare events. This paper compares the performance of four ML models, logistic regression (LR), recursive partitioning (RP), random forest (RF) and neural network (NN), in predicting tuberculosis (TB) mortality using LR as a benchmark. The models were applied to TB mortality data based on verbal autopsies, covering 9,495 deaths in Thailand of individuals aged five years and above, in 2005, using both the original data and also a double-sized dataset produced by the bootstrap sampling technique. The results revealed that LR performed best in predicting rare events (TB deaths) in the original data whereas RF performed best with a larger sample size. This paper concludes that the size of the dataset available for learning greatly increases the performance of all the models except LR. Furthermore, the RP, RF and NN algorithms require large training datasets for the learning process. Moreover, the use of predictive accuracy alone is of limited value in distinguishing the performance of different models, particularly when the occurrence of the event is rare. |
Keywords and phrases: classification, machine learning, predictive performance, training size.
|
|
Number of Downloads: 298 | Number of Views: 1023 |
|