Archive for Approximate Bayesian computation

next OWABI webinar [24 April]

Posted in pictures, Statistics, Uncategorized, University life with tags , , , , , , , , , , , , on April 16, 2025 by xi'an


The next One World Approximate Bayesian Inference (OWABI) Seminar is scheduled on Thursday the 24th of April at 11am UK time (12am CET) with the speaker being Ayush Bharti (Aalto University), who will talk about

“Cost-aware simulation-based inference “

Abstract: Simulation-based inference (SBI) is the preferred framework for estimating parameters of intractable models in science and engineering. A significant challenge in this context is the large computational cost of simulating data from complex models, and the fact that this cost often depends on parameter values. We therefore propose cost-aware SBI methods which can significantly reduce the cost of existing sampling-based SBI methods, such as neural SBI and approximate Bayesian computation. This is achieved through a combination of rejection and self-normalised importance sampling, which significantly reduces the number of expensive simulations needed. Our approach is studied extensively on models from epidemiology to telecommunications engineering, where we obtain significant reductions in the overall cost of inference. .

next OWABI webinar [27 March]

Posted in pictures, Statistics, Uncategorized, University life with tags , , , , , , , , , , , , , , , , on March 25, 2025 by xi'an


The next One World Approximate Bayesian Inference (OWABI) Seminar is scheduled on Thursday the 27th of March at 11am UK time (12am CET) with the speaker being Meïli Baragatti (Université de Montpellier)

 Approximate Bayesian Computation with Deep Learning and Conformal Prediction

Abstract: Approximate Bayesian Computation (ABC) methods are commonly used to approximate posterior distributions in models with unknown or computationally intractable likelihoods. Classical ABC methods are based on nearest neighbour type algorithms and rely on the choice of so-called summary statistics, distances between datasets and a tolerance threshold. Recently, methods combining ABC with more complex machine learning algorithms have been proposed to mitigate the impact of these “user-choices”. In this talk, I will present you the first, to our knowledge, ABC method completely free of summary statistics, distance, and tolerance threshold. Moreover, in contrast with usual generalisations of the ABC method, it associates a confidence interval (having a proper frequentist marginal coverage) with the posterior mean estimation (or other moment-type estimates). This method, named ABCD-Conformal, uses a neural network with Monte Carlo Dropout to provide an estimation of the posterior mean (or other moment type functionals), and conformal theory to obtain associated confidence sets. I will compare its performances with other ABC methods on several examples, and show you that it is efficient for estimating multidimensional parameters, while being “amortised”.

Keywords: simulation-based inference, approximate Bayesian computation, neural posterior estimation, convolutional neural networks, dropout, conformal prediction

statistical accuracy of neural posterior and likelihood estimation

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , on March 17, 2025 by xi'an

As I have been aiming at mentioning this news for quite a while, David Frazier, Ryan Kelly, Christopher Drovandi, and David Warne arXived last November a paper that parallels our paper (with David and Gael) on ABC consistency and some earlier papers of theirs for synthetic likelihood in the case of neural posterior approximations, under similar conditions (see, e.g., Assumptions 1 and 2), with potential reduced computational cost in some situations.

“NLE requires additional MCMC steps to produce a posterior approximation, whereas NPE produces a posterior approximation directly and does not require any additional sampling”

Convergence is achieved when the neural  learning size grows fast enough with the sample size. And when the tolerance decreases fast enough with respect to the convergence rate of the summary statistic. Two options are possible, that is either approximating the likelihood and then exploiting this approximation in an MCMC algorithm, or directly approximating the posterior distribution, as a function of of the summary statistic Sn (rather than for the observed S⁰n), with arguments favouring the second option.

“if the intractable posterior Π(· | Sn) is asymptotically Gaussian a nd calibrated, then so long as νnγN = o(1), the NPE is also asymptotically Gaussian and calibrated”

where γN denotes the rate at which the neural approximation of the posterior converges to the ideal posterior (for the Kullback-Leibler divergence) in N the size of the learning sample. And νn is the rate of convergence of the statistic Sn to its asymptotic mean. The convergence result does not make explicit assumptions on the class of neural posteriors, but it requires that the observed statistic must fit within the range of the simulated values (a possibility illustrated in the paper with an MA(2) model that was already used in several of our papers (as I noticed when giving an ABC masterclass in Warwick this very week).

“While neural methods and normalizing flows are common choices for the approximating class Q, the diversity of such methods, along with their complicated tuning and training regimes, makes establishing theoretical results on the rate of convergence, γN difficult”

Under stronger and hard to check assumptions, namely on the minimaxity of the posterior density estimator within the class of locally β-Hölder functions, they recover a closed form γN . Which unravels how N should be chosen (with a surprising addition of the dimensions of the parameter θ and of the summary Sn. With a resulting explosion in the theoretical minimal value of N one should use. (And decent performances of the method with smaller values of N!) Concerning minimaxity, I have no intuition how this impacts the sparseness (lack thereof) of the neural networks that can be used.

I am wondering at strategies to remove superfluous statistics since their dimension matters so much and in detecting or evaluating the misspecification (or its complement, the compatibility, as discussed on page 31). But all in all this paper represents a massive addition to the consistency results for approximate Bayesian inference methods!

amortized Bayesian mixture model

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on February 7, 2025 by xi'an

A few days before the January OWABI, I read through Simon Kucharsky’s and Paul Bürkner’s paper, arXived on 17 January. Which proposed an amortized Bayesian inference (ABI) method, even though the ABI is not the same as in OWABI! The motivation for their work is to start from a (standard) mixture model where the components are not analytically tractable (but still parameterised). But a generative model nonetheless. As in the earlier reviewed paper (which was arXived on the same day), by MEJ Newman, the dual representation of the joint posterior p(θ,z|x) as p(z|x,θ)p(θ|x) and p(θ|z,x)p(z|x) is (over?) emphasized (albeit unclearly why!). ABI uses neural networks and more specifically normalising flows to approximate the posterior p(θ|x) from prior predictive samples (θ,x) (as in ABC), and then directly exploit the invertibility of said flows to generate from this approximate posterior. One interesting aspect of the modelling is the derivation of summary statistics in the design of the network, albeit mixture posteriors do not allow for dimension-reduced (Bayes) sufficient statistics (and a contradictory sentence that conditioning on the summaries “does not alter the target posterior”, p7). The resulting approximate posterior generator proves much much faster than running an MCMC, obviously, and furthermore adapt to handling a sequence of datasets. A second network is constructed to approximate p(z|x,θ), using the same summaries. The network parameters are estimated through losses, rather than in a Bayesian manner, with a default Kullback-Leibler version (18). I also fail to understand why the networks are trained over unconstrained parameters when all parameters could become unconstrained when using the adequate parameterisation. And am fairly surprised at the regression towards the ill-fated step of using ordered parameters to avoid label switching… But the main quandary remains the issue of assessing the approximation effect, despite experiments aiming at pacifying such worries. And similarities with Stan and BayesFlow.

All about that [Bayes] seminar [24 Jan]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on January 13, 2025 by xi'an

The next All about that (Bayes) seminar will take place on Friday 24 Jan at SCAI, on the Jussieu campus, with the following talks. (Appearances to the contrary, I was not in the least involved in the program!)

13h30 – 14h30 Joshua Bon (OCEAN, Université Paris Dauphine) – Bayesian score calibration for approximate models

 Scientists continue to develop increasingly complex mechanistic models to reflect their knowledge more realistically. Statistical inference using these models can be challenging since the corresponding likelihood function is often intractable and model simulation may be computationally burdensome. Fortunately, in many of these situations, it is possible to adopt a surrogate model or approximate likelihood function. It may be convenient to conduct Bayesian inference directly with the surrogate, but this can result in bias and poor uncertainty quantification. In this paper (https://arxiv.org/abs/2211.05357) we propose a new method for adjusting approximate posterior samples to reduce bias and produce more accurate uncertainty quantification. We do this by optimizing a transform of the approximate posterior that maximizes a scoring rule. Our approach requires only a (fixed) small number of complex model simulations and is numerically stable. We demonstrate beneficial corrections to several approximate posteriors using our method on several examples of increasing complexity.

14h30 – 15h30 Giacomo Zanella (Bocconi University) – Entropy contraction of the Gibbs sampler under log-concavity

In this talk I will present recent work (https://arxiv.org/abs/2410.00858) on the non-asymptotic analysis of the Gibbs sampler, a classical and popular MCMC algorithm for sampling. In particular, under the assumption that the probability measure π of interest is strongly log-concave, we show that the random scan Gibbs sampler contracts in relative entropy, and provide a sharp characterization of the associated contraction rate. The result implies that, under appropriate conditions, the number of full evaluations of π required for the Gibbs sampler to converge is independent of the dimension. If time permits, I will also discuss connections and applications of the above results to the problem of zero-order parallel sampling, as well as extensions to Hit-and-Run and Metropolis-within-Gibbs.

Based on joint work with Filippo Ascolani and Hugo Lavenant.

16h00 – 17h00 Paul Bastide (Université Paris Cité) – Goodness of Fit for Bayesian Generative Models with Applications in Population Genetics

In population genetics, inference about intractable likelihood models is common, and simulation methods, including Approximate Bayesian Computation (ABC) and Simulation-Based Inference (SBI), are essential. ABC/SBI methods work by simulating instrumental data sets of the models under study and comparing them with the observed data set y⁰. Advanced machine learning tools are used for tasks such as model selection and parameter inference. The present work focuses on model criticism. This type of analysis, called goodness of fit (GoF), is important for model validation. It can also be used for model pruning when the number of candidates to be considered is excessive, especially in the context where data simulation is expensive. We introduce two new GoF tests based on the local outlier factor (LOF), an indicator that was initially defined for outlier and novelty detection. We test whether y⁰ is distributed from the prior predictive distribution (pre-inference GoF) and whether there is a parameter value such that y⁰ is distributed from the likelihood with that value (post-inference GoF).  We evaluate the performance of our two GoF tests on simulated datasets from three different model settings of varying complexity, and on a dataset of single nucleotide polymorphism (SNP) markers for the evaluation of complex evolutionary scenarios of modern human populations.

Joint work with Guillaume Le Mailloux, Jean-Michel Marin and Arnaud Estoup.