Advances and Applications in Statistics
Volume 45, Issue 2, Pages 105 - 123
(May 2015) http://dx.doi.org/10.17654/ADASMay2015_105_123 |
|
FEATURES SELECTION USING PARAMETRIC AND NON-PARAMETRIC METHODS: TAG SNPs SELECTION USING GA-SVM AND GA-KNN
Amr I. A. Elatraby and Rashad R. T. Wahba
|
Abstract: The study of genetic variations of the human genome, especially Single Nucleotide Polymorphisms (SNPs), can lead to the discovery of new methods to prevent, diagnose and treat diseases. Full examination of all the SNPs of the human genome has become too expensive, thus a small subset of informative SNPs called tag SNPs must be selected. In this study, two methods for the selection of tag SNPs are presented. The first method is called GA-SVM, which integrates the Support Vector Machine (SVM) as a parametric technique with the Genetic Algorithm (GA). The second method is called GA-KNN, which integrates the K-Nearest Neighbor (KNN) as a non-parametric technique with GA. The two methods are tested on a group of genes, which known to be related to the natural clearance of Hepatitis C Virus (HCV). The genes’ SNPs data had extracted from the HapMap site (http://hapmap.org). Moreover, the prediction accuracy of each method has been evaluated by using the 10-Fold Cross Validation (10-FCV) method. Our results have showed that, although the prediction accuracy of GA-SVM outperforms the prediction accuracy of GA-KNN when selecting a very small number of tag SNPs, the prediction accuracy of GA-KNN outperforms GA-SVM in all other cases. In addition, our results have indicated that the GA-KNN method requires more computing time as compared with GA-SVM. |
Keywords and phrases: Single Nucleotide Polymorphisms (SNPs), tag SNPs, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Genetic Algorithm (GA). |
|
Number of Downloads: 428 | Number of Views: 1071 |
|