Pushpa Publishing House

Journal Menu

Content

Volume 25 (2025)

Volume 24 (2024)

Volume 23 (2023)

Volume 18 (2021)

Volume 17 (2020)

Volume 16 (2019)

	Volume 16, Issue 2 Pg 1 - 158 (December 2019)
	Volume 16, Issue 1 Pg 1 - 111 (June 2019)

Volume 15 (2018)

Volume 14 (2017)

Volume 13 (2016)

Volume 12 (2015)

Volume 11 (2014)

Volume 10 (2013)

Volume 9 (2013)

Volume 8 (2012)

Volume 7 (2012)

Volume 6 (2011)

Volume 5 (2011)

Volume 4 (2010)

Volume 3 (2009)

Volume 2 (2008)

Volume 1 (2007)

Important: All future articles and volumes will be published only on our new website: pphmjopenaccess.com. Authors are requested to submit their papers through the new website only. Visit now: pphmjopenaccess.com

JP Journal of Biostatistics

JP Journal of Biostatistics
Volume 16, Issue 2, Pages 79 - 90 (December 2019)
http://dx.doi.org/10.17654/BS016020079

VALIDATION OF CLASSIFICATION MODELS AND DATA REDUCTION METHODS BASED ON
GENE EXPRESSION DATA

Mohammad Rafiee, Fatemeh Rafiei, Seyyed Mohammad Tabatabaei, Hamid AlaviMajd, Ali Rafiei and Soheila Khodakarim

Abstract:

Background

The microarray technology has provided the simultaneous monitoring of the expression levels for thousands of genes. The analysis of these datasets is a problem in the century of bioinformatics revolution. The classifier methods such as data mining, machine learning and regression have been applied to differentiate between normal and abnormal samples in gene expression datasets, copiously.

Method

In this study, the classification accuracy of support vector machine (SVM), least square support vector machine (LSSVM), radial base function neural network (RBFNN), Bayesian probit kernel regression (BPKR) and Bayesian logistic kernel regression (BLKR) models on normal and abnormal samples was calculated based on two gene expression datasets and three reduced dimension sets multivariate median gene set analysis (MMGSA), PCA with Karhunen-Loeve transform (PCA-KL) and auto-encoder networks.

Results

The BKPR method, in full and PCA-KL data with Gaussian and linear kernel, has a high accuracy (up to 94%) and in encoder data with Gaussian kernel has 83% accuracy and in MMGSA data with linear kernel has 92% accuracy. The SVM method in full, PCA-KL and MMGSA data has accuracy up to 94%. The LSSVM method in full and MMGSA data have an acceptable implementation. In MMGSA data, the highest accuracy is 85% related to the SVM method and the BKPR method with Gaussian kernel.

Conclusion

The MMGSA or other gene set analysis approaches are recommended for data reduction (if needed), because they improve the interpretability of the results, and the BKPR and SVM methods are recommended for classification.

Keywords and phrases:

data mining, data reduction, gene expression.

Number of Downloads: 523 | Number of Views: 1854