Abstract: The present study is concerned with the practical implementation of a validity test used for testing parameters in finite mixture distributions (also including numbers of components), that is based on the stochastic EM (or SEM) algorithm, a stochastic version of the well-known EM algorithm. Although has the same algebraic form as a Hotelling’s statistic, it has been shown in earlier literature that the stochastic process generated from SEM turns out to be autoregressive of order 1 and thus usual asymptotic theory pertaining to independent vector random variables cannot be used for inference. In the same reference, the true null asymptotic distribution for was found to be a linear combination of chi-square (1) distributions. However, a problem that occurs in practical implementations is that the coefficients appearing in this linear combination cannot be estimated when the number of parameters under estimation is larger than one. In order to address this problem, a heuristic result from earlier theory was used, that was assessed by comparing when the true model is fitted to the data to an algebraic representation based on an independent process, and the adequacy of the usual chi-square test was investigated. Furthermore, was also applied and assessed for cases for which a wrong model is fitted to the data. |
Keywords and phrases: mixture distribution, stochastic EM, autoregressive process, Hotelling’s statistic.
Received: January 4, 2022; Accepted: February 16, 2022; Published: February 26, 2022
How to cite this article: Athanase Polymenis, Implementation of a validity test associated with the stochastic EM algorithm in mixture analysis, Advances and Applications in Statistics 74 (2022), 129-144. DOI: 10.17654/0972361722022
This Open Access Article is Licensed under Creative Commons Attribution 4.0 International License
References:
[1] T. W. Anderson, On asymptotic distributions of estimates of parameters of stochastic difference equations, Ann. Math. Statist. 30(3) (1959), 676-687. [2] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd ed., Wiley, New York, 1984. [3] G. Celeux, Validity tests in cluster analysis using a probabilistic teacher algorithm, Compstat 86, Physica-Verlag, Heidelberg, 1986. [4] G. Celeux, Reconnaissance de Mélanges de Densités de Probabilité et Applications à la Validation des Résultatsen Classification, Ph.D. Thesis, University Paris 9-Dauphine, 1987. [5] G. Celeux and J. Diebolt, The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem, Computational Statistics Quarterly 2 (1985), 73-82. [6] G. Celeux and J. Diebolt, A random imputation principle: the stochastic EM algorithm, Research Report RR-0901, INRIA (inria-00075655), 1988. https://hal.inria.fr/inria-00075655. [7] A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood estimation from incomplete data via the EM algorithm (with discussion), Journal of the Royal Statistical Society B 39 (1977), 1-38. [8] J. Diebolt and G. Celeux, Asymptotic properties of a stochastic EM algorithm for estimating mixing proportions, Communications in Statistics - Stochastic Models 9(4) (1993), 599-613. [9] G. McLachlan and D. Peel, Finite Mixture Models, Wiley, New York, 2000. [10] A. Polymenis, A note on a validity test using the stochastic EM algorithm in order to assess the number of components in a finite mixture model, Statistics 42(3) (2008), 261-274. [11] A. Polymenis, Performance of the stochastic EM algorithm for estimating mixture parameters, International Journal of Computational and Theoretical Statistics 5(2) (2018), 89-94. [12] D. M. Titterington, A. F. M. Smith and U. E. Makov, Statistical Analysis of Finite Mixture Distributions, Wiley, New York, 1985. [13] M. P. Windham and A. Cutler, Information ratios for validating mixture analyses, J. Amer. Statist. Assoc. 87(420) (1992), 1188-1192.
|