Archive for self-normalised importance sampling

BayesComp 2025.2

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 19, 2025 by xi'an


The main BayesComp²⁵ conference started with Pierre Jacob’s plenary talk on his recent advances on coupling for unbiased MCMC—currently ERC grantee on that topic—.  Raising lazy questions like using a different target or transition kernel for the second chain in the coupling, connecting the Poisson equation and control variates, handling the signed issue with the unbiased approximations. Interestingly, they obtain an unbiased estimator of the asymptotic variance of the unbiased estimator. And a correction for self-normalised importance sampling, which has some connections with our 1996 (?) pinball sampler. Also an evaluation of the median of means, rather than the average of means, which is a thing I had been (lazily) contemplating for a while  (On the greedy side, as I was writing my recovery exam for my Monte Carlo course, I realised the results Pierre presented could be somewhat recycled into exam problems!)

My first parallel session was on gradient-based methods with a talk by Francesca Crucinio on proximal particle Langevin algorithms (similar to the one she gave in PariSanté last year), a talk by Zhihao Wang on stereographic multiple try Metropolis(-Rosenbluth-Teller) that unsurprisingly recovers ergodicity thanks to the compactness of the ball. For which I wonder why a Normal proposal makes complete sense since one could consider a mover after the projection instead and why iid rather than repelling multiple proposals are used… The last speaker was just out from the plane from California, Siddharth Vishwanath who spoke about repelling-attracting HMC. With very nice animations of HMC, if reaching the main point of using both negative and positive frictions a few minutes before the session finished. The method preserves volume and potential, if not energy.

Speaking of which (energy), I find myself struggling with my less than 6 hours of sleep since arrival during the first afternoon session, despite a fiery hot spot lunch, which means in plainer terms that I alas dozed in and out of the talks. The second session saw Jack Jewson exposing in deeper details the exact PDMP algorithm for Gibbs measures  Jeremias Knoblauch mentioned yesterday. And Jonathan Huggins as well, using Gaussian processes as proxies for expected likelihoods, with lower guarantees than pseudo-marginal versions. In a mildly connected way, Robin Ryder went through the resolution of the ecological inference challenge they produce with Nicolas Chopin and Théo Valdoire (all authors with whom I am connected, Théo being a brillant student of our MASH Master last year and now in Harvard, hopefully till the end of his PhD!)

On the extra-academic curriculum, I had a yummy dinner in the Maxwell Hawker (street) food centre, incl. Xiao Long Bao that cooled down fast enough to avoid the usual scalding effect, plus rojak a mixed fruit and vegetable fried in a peanut sauce that I had never tasted before, popiah (ditto), chili noodles, and an appam with durian deepfried balls as a fabulous and unexpected dessert.

exceptional OWABI web/sem’inar [19 June, BayesComp²⁵]

Posted in pictures, Statistics, Travel, Uncategorized, University life with tags , , , , , , , , , , , , , , on June 10, 2025 by xi'an


Exceptionally, the next One World Approximate Bayesian Inference (OWABI) Seminar will be hybrid as it is scheduled to take place during BayesComp 2025 in Singapore, on Thursday 19 June at 8pm Singapore time (1pm in Tórshavn) and two talks, one by Filippo Pagani on

Approximate Bayesian Fusion
Bayesian Fusion is a powerful approach that enables distributed inference while maintaining exactness. However, the approach is computationally expensive. In this work, we propose a novel method that incorporates numerical approximations to alleviate the most computationally expensive steps, thereby achieving substantial reductions in runtime. Our approach retains the flexibility to approximate the target posterior distribution to an arbitrary degree of accuracy, and is scalable with respect to both the size of the dataset and the number of computational cores. Our method offers a practical and efficient alternative for large-scale Bayesian inference in distributed environments.
and one by Maurizio Filippone on
GANs Secretly Perform Approximate Bayesian Model Selection
Generative Adversarial Networks (GANs) are popular models achieving impressive performance in various generative modeling tasks. In this work, we aim at explaining the undeniable success of GANs by interpreting them as probabilistic generative models. In this view, GANs transform a distribution over latent variables Z into a distribution over inputs X through a function parameterized by a neural network, which is usually referred to as the generator. This probabilistic interpretation enables us to cast the GAN adversarial-style optimization as a proxy for marginal likelihood optimization. More specifically, it is possible to show that marginal likelihood maximization with respect to model parameters is equivalent to the minimization of the Kullback-Leibler (KL) divergence between the true data generating distribution and the one modeled by the GAN. By replacing the KL divergence with other divergences and integral probability metrics we obtain popular variants of GANs such as f-GANs, Wasserstein-GANs, and Maximum Mean Discrepancy (MMD)-GANs. This connection has profound implications because of the desirable properties associated with marginal likelihood optimization, such as (i) lack of overfitting, which explains the success of GANs, and (ii) allowing for model selection, which opens to the possibility of obtaining parsimonious generators through architecture search.

These talks will be delivered on-site and on-line, as a Zoom visio-conference.

next OWABI webinar [24 April]

Posted in pictures, Statistics, Uncategorized, University life with tags , , , , , , , , , , , , on April 16, 2025 by xi'an


The next One World Approximate Bayesian Inference (OWABI) Seminar is scheduled on Thursday the 24th of April at 11am UK time (12am CET) with the speaker being Ayush Bharti (Aalto University), who will talk about

“Cost-aware simulation-based inference “

Abstract: Simulation-based inference (SBI) is the preferred framework for estimating parameters of intractable models in science and engineering. A significant challenge in this context is the large computational cost of simulating data from complex models, and the fact that this cost often depends on parameter values. We therefore propose cost-aware SBI methods which can significantly reduce the cost of existing sampling-based SBI methods, such as neural SBI and approximate Bayesian computation. This is achieved through a combination of rejection and self-normalised importance sampling, which significantly reduces the number of expensive simulations needed. Our approach is studied extensively on models from epidemiology to telecommunications engineering, where we obtain significant reductions in the overall cost of inference. .

next OWABI webinar [27 Feb]

Posted in pictures, Statistics, Uncategorized with tags , , , , , , , , , , on February 21, 2025 by xi'an


The next One World Approximate Bayesian Inference (OWABI) Seminar is scheduled on Thursday, 27 February at 11am UK time, with the speaker being Ayush Bharti (Aalto University),

Cost-aware simulation-based inference

Abstract: Simulation-based inference (SBI) is the preferred framework for estimating parameters of intractable models in science and engineering. A significant challenge in this context is the large computational cost of simulating data from complex models, and the fact that this cost often depends on parameter values. We therefore propose cost-aware SBI methods which can significantly reduce the cost of existing sampling-based SBI methods, such as neural SBI and approximate Bayesian computation. This is achieved through a combination of rejection and self-normalised importance sampling, which significantly reduces the number of expensive simulations needed. Our approach is studied extensively on models from epidemiology to telecommunications engineering, where we obtain significant reductions in the overall cost of inference.

Keywords: simulation-based inference, approximate Bayesian computation, neural posterior estimation, neural likelihood estimation, importance sampling

importance sampling and independent Metropolis–Hastings with unbounded weights

Posted in Books, Statistics with tags , , , , , , , , , , , , on December 12, 2024 by xi'an

George Deligiannidis, Pierre E. Jacob, El Mahdi Khribch, and Guanyang Wang just arXived a paper on the respective behaviours of importance sampling and independent Metropolis–Hastings (IMH) under the same proposal when the importance weight is unbounded but enjoys a p-th moment with p≥2. Both algorithms are sharing a lot, with importance sampling appearing as a rough Rao-Blackwellisation of Metropolis-Hastings, and its asymptotic variance being smaller than the asymptotic variance of Metropolis-Hastings. I was unable to check whether or not their conditions encompassed the highly interesting case when the integrand f is integrable under the target π, but not L²(π). (Theorem 2.3 does not seem to include this case.)

They consider a particular (!) version of Metropolis–Hastings (IMH) under the same proposal when the importance weight is unbounded but enjoys a p-th moment. Both algorithms are sharing a lot, with importance sampling appearing when N iid proposed values are drawn at once and accepted or rejected (again at once) with an acceptance ratio the average of the weights. Although this is already found in a 2010 paper by Christophe Andrieu and co-authors, and stem from an unbiased importance sampler, I was not aware of this version. My initial feeling (predictably) was pessimistic, but thinking about it, using the average weight brings into the sample simulations with small weights that would otherwise be discarded. Of course, a rejection proves N times more costly. But this is truly a form of Rao-Blackwellisation in the sense that it removes the weight variability to some extent (see p5) and it turns the outcome into an unbiased estimator. Despite the self-normalising behaviour! They also conclude that the rejection probability is at least c/√N  on average (Remark 4.1).

“We show that the bias of self-normalized importance sampling is of order N 1, and we obtain new bounds on the moments of the error in importance sampling. We then consider IMH, and show that the common random numbers coupling is optimal. Using this coupling, we show that the total variation distance between IMH at iteration t and π decays as tp-1.”

They also compare the biases in sampling importance resampling and independent Metropolis–Hastings, with the later getting the upper hand, but I do not see the justification in resampling when computing an integral. Since this does not a sample from the target, especially when the weights are unbounded, and adds to the variability of the estimator. They further propose a (telescopic) unbiased modification to the self-normalised importance sampling estimator, with an inefficiency twice as high. But a neat Rao-Blackwellisation trick brings it back to the same level!