Abstract: We compared two groups of cells using a method for identifying genes, called differentially expressed genes, whose expression levels differed between the groups, such as normal and cancer cells, using genomic data. One of the widely used methods for identifying differentially expressed genes involves testing each gene for two groups of expression levels. We compared five tests, such as the student’s t test, Wilcoxon rank sum test, moderated t test, Kolmogorov-Smirnov test, and Brunner-Munzel test, to assess the expression levels in the two groups. The simulations were conducted under four conditions: (1) the first situation assumes homoscedasticity of expression levels and independence of genes; (2) the second situation assumes only homoscedasticity of the expression levels; (3) the third situation assumes only the independence of the genes; (4) the remaining situation cannot assume homoscedasticity and independence. Then, we used the area under the curve (AUC) as the evaluation index. The AUC was larger for the moderated t test, student’s t test, Wilcoxon rank-sum test, Brunner-Munzel test, and Kolmogorov-Smirnov test.
|
Keywords and phrases: differentially expressed genes, moderated t test, FDR, Benjamini-Hochberg method.
Received: April 10, 2023; Accepted: May 10, 2023; Published: June 19, 2023
How to cite this article: Yuki Ando, Asanao Shimokawa and Etsuo Miyaoka, Comparison of two group comparative methods for genomic data, Advances and Applications in Statistics 38(2) (2023), 139-158. http://dx.doi.org/10.17654/0972361723043
This Open Access Article is Licensed under Creative Commons Attribution 4.0 International License
References; [1] Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series b-Methodological 57 (1995), 289-300. doi: 10.1111/j.2517-6161.1995.tb02031.x [2] V. W. Berger and Y. Zhou, Kolmogorov-Smirnov tests, The Encyclopedia of Statistics in Behavioral Science, Vol. 2, Hoboken, Wiley, 2005, pp. 1023-1026. [3] R. Breitling, P. Armengaud, A. Amtmann and P. Herzyk, Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments, FEBS Letters 573(1) (2004), 83-92. doi: 10.1016/j.febslet.2004.07.055 [4] E. Brunner and U. Munzel, The nonparametric Behrens-Fisher problem: Asymptotic theory and a small-sample approximation, Biometrical Journal 42 (2000), 17-25. doi: 10.1002/(SICI)1521-4036(200001)42:117::AID-BIMJ173.0.CO;2-U [5] B. Han, X. W. Chen, X. Wang and E. K. Michaelis, Integrating multiple microarray data for cancer pathway analysis using bootstrapping K-S test, Journal of Biomedicine and Biotechnology 2009 (2009), 707580.1-707580.11. doi:10.1155/2009/707580 [6] D. Holye, M. Rattray, R. Jupp and A. Brass, Making sense of microarray data distributions, Bioinformatics 18 (2002), 576-584. doi:10.1093/bioinformatics/18.4.576 [7] K. Kadota, Y. Nakai and K. Shimizu, A weighted average difference method for detecting differentially expressed genes from microarray data, Algorithms for Molecular Biology 3(1) (2008), 8. doi: 10.1186/1748-7188-3-8 [8] A. Kolmogorov, Sulla determinazione empirica di una legge di distribuzioe, Giornale dell’Instituto Italiano degla Attuari 4 (1933), 83-91. [9] M. Neuhaser and F. C. Lam, Nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments, Second Asia-Pacific Bioinformatics Conference 29 (2004), 139-143. [10] W. Pan, A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments, Bioinformatics 18(4) (2002), 546-554. doi: 10.1093/bioinformatics/18.4.546 [11] G. K. Smyth, Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, Stat. Appl. Genet. Mol. Biol. 3 (2004), 1-28. doi: 10.2202/1544-6115.1027 [12] O. G. Troyanskaya, M. E. Garber, P. O. Brown, D. Botstein and R. B. Altman, Nonparametric methods for identifying differentially expressed genes in microarray data, Bioinformatics 18(11) (2002), 1454-1461. doi: 10.1093/bioinformatics/18.11.1454 [13] F. Wilcoxon, Individual comparisons by ranking methods, Biometrics 1 (1945), 80-83.
|