Archive for latent variable models
mostly MCMC’s back
Posted in Statistics, University life with tags interacting particle systems, Langevin MCMC algorithm, latent variable models, MCMC, Mostly MCMC seminar, Paris, PariSanté campus, Porte de Versailles, proximal interacting particle Langevin algorithms, PSL, SDEs, seminar, Université Paris Dauphine on September 13, 2024 by xi'anulΓimaΓe Pólγa
Posted in Books, pictures, Statistics, Travel with tags auxiliary variables, Data augmentation, Gibbs sampling, latent variable models, Polya, Schönbrunn palace, TU Wien, Vienna, Wien on July 18, 2023 by xi'an
Last week, Gegor Zens, Sylvia Frühwirth-Schnatter, and Helga Wagner arXived a revision of their paper on latent Pólya-Gamma random variables for logistic regression models (which I had not read before). The central idea follows from Albert and Chib 1993 paper on a Gibbs sampler for binary and polychotomous data, namely a data augmentation (a.k.a. Gibbs sampling) that is natural as it allows for direct and uncalibrated sampling, but is not necessarily the best choice, since the completion by latent variables is prone to increase computing time and slow down exploration. In addition, since the posterior is close to Normal, a Metropolis scheme based on the MLE asymptotic distribution could perform well, w/o the completion step. As for other latent variable models such as mixtures, I keep wondering how efficiency could be improved by some latent variates not be changing at every iteration, given their almost Dirac (conditional) distribution. Especially in imbalanced cases. The paper proposes several novel mixture representations that lead to know distributions on the mixing parameter, constructed as in Jun Liu’s and Xiao-Li Meng’s auxiliary scale (or location-scale) completions, but these require an artificial parameter that need be calibrated.
observed vs. complete in EM algorithm
Posted in Statistics with tags cross validated, EM algorithm, expectation maximisation, latent variable models, missing values, numerical maximisation on November 17, 2022 by xi'an
While answering a question related with the EM algorithm on X validated, I realised a global (or generic) feature of the (objective) E function, namely that
can always be written as
therefore always includes the (log-) observed likelihood, at least in this formal representation. While the proof that EM is monotonous in the values of the observed likelihood uses this decomposition as well, in that
I wonder if the appearance of the actual target in the temporary target E(θ’|θ) can be exploited any further.
efficiency of normalising over discrete parameters
Posted in Statistics with tags arXiv, Gibbs sampler, Hamiltonian Monte Carlo, JAGS, latent variable models, marginalisation, MCMC, mixtures of distributions, Monte Carlo experiment, STAN on May 1, 2022 by xi'an
Yesterday, I noticed a new arXival entitled Investigating the efficiency of marginalising over discrete parameters in Bayesian computations written by Wen Wang and coauthors. The paper is actually comparing the simulation of a Gibbs sampler with an Hamiltonian Monte Carlo approach on Gaussian mixtures, when including and excluding latent variables, respectively. The authors missed the opposite marginalisation when the parameters are integrated.
While marginalisation requires substantial mathematical effort, folk wisdom in the Stan community suggests that fitting models with marginalisation is more efficient than using Gibbs sampling.
The comparison is purely experimental, though, which means it depends on the simulated data, the sample size, the prior selection, and of course the chosen algorithms. It also involves the [mostly] automated [off-the-shelf] choices made in the adopted software, JAGS and Stan. The outcome is only evaluated through ESS and the (old) R statistic. Which all depend on the parameterisation. But evacuates the label switching problem by imposing an ordering on the Gaussian means, which may have a different impact on marginalised and unmarginalised models. All in all, there is not much one can conclude about this experiment since the parameter values beyond the simulated data seem to impact the performances much more than the type of algorithm one implements.
ordered allocation sampler
Posted in Books, Statistics with tags Data augmentation, Galaxy, Gibbs sampling, hidden Markov models, JASA, label switching, latent variable models, MCMC, partition function, random partition trees, SMC, statistical methodology on November 29, 2021 by xi'an
Recently, Pierpaolo De Blasi and María Gil-Leyva arXived a proposal for a novel Gibbs sampler for mixture models. In both finite and infinite mixture models. In connection with Pitman (1996) theory of species sampling and with interesting features in terms of removing the vexing label switching features.
“The key idea is to work with the mixture components in the random order of appearance in an exchangeable sequence from the mixing distribution (…) In accordance with the order of appearance, we derive a new Gibbs sampling algorithm that we name the ordered allocation sampler. “
This central idea is thus a reinterpretation of the mixture model as the marginal of the component model when its parameter is distributed as a species sampling variate. An ensuing marginal algorithm is to integrate out the weights and the allocation variables to only consider the non-empty component parameters and the partition function, which are label invariant. Which reminded me of the proposal we made in our 2000 JASA paper with Gilles Celeux and Merrilee Hurn (one of my favourite papers!). And of the [first paper in Statistical Methodology] 2004 partitioned importance sampling version with George Casella and Marty Wells. As in the later, the solution seems to require the prior on the component parameters to be conjugate (as I do not see a way to produce an unbiased estimator of the partition allocation probabilities).
The ordered allocation sample considers the posterior distribution of the different object made of the parameters and of the sequence of allocations to the components for the sample written in a given order, ie y¹,y², &tc. Hence y¹ always gets associated with component 1, y² with either component 1 or component 2, and so on. For this distribution, the full conditionals are available, incl. the full posterior on the number m of components, only depending on the data through the partition sizes and the number m⁺ of non-empty components. (Which relates to the debate as to whether or not m is estimable…) This sequential allocation reminded me as well of an earlier 2007 JRSS paper by Nicolas Chopin. Albeit using particles rather than Gibbs and applied to a hidden Markov model. Funny enough, their synthetic dataset univ4 almost resembles the Galaxy dataset (as in the above picture of mine)!
