Archive for Bayesian neural networks

OWABI Season VII

Posted in Statistics with tags , , , , , , , , , , , , , on October 17, 2025 by xi'an

A new season of the One World Approximate Bayesian Inference (OWABI) Seminar is about to start!
The 1st OWABI talk of the Season will be given by François-Xavier Briol (University College London). who will talk about “Multilevel neural simulation-based inferenceon Thursday the 30th October at 11am UK time.
Abstract
Neural simulation-based inference (SBI) is a popular set of methods for Bayesian inference when models are only available in the form of a simulator. These methods are widely used in the sciences and engineering, where writing down a likelihood can be significantly more challenging than constructing a simulator. However, the performance of neural SBI can suffer when simulators are computationally expensive, thereby limiting the number of simulations that can be performed. In this paper, we propose a novel approach to neural SBI which leverages multilevel Monte Carlo techniques for settings where several simulators of varying cost and fidelity are available. We demonstrate through both theoretical analysis and extensive experiments that our method can significantly enhance the accuracy of SBI methods given a fixed computational budget.
Keywords: Multifidelity, neural SBI, multi-level Monte Carlomultilevel Monte Carlo

scalable Monte Carlo for Bayesian learning [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on September 26, 2025 by xi'an

This book by Paul Fearnhead, Christopher Nemeth, Chris Oates, and Chris Sherlock is part of the IMS Monograph series. And published by Cambridge University Press. It covers most recent developments in MCMC methods, namely stochastic gradient MCMC (Chap. 3), non-reversible MCMC (Chap. 4), continuous-time MCMC (Chap. 5), and assessing and improving MCMC (Chap. 6). I find the book remarkable in its attention to rigour and clarity, without falling into overly technical derivations. It is perfectly suited for a graduate course to students with a solid mathematical background. In short, had I considered a new edition of our Monte Carlo Statistical Methods book to incorporate these advances, I could not done such a good job!

The first chapter provides a quick refresher of the background, from Monte Carlo principles, to Markov chains, SDEs, and the kernel “trick” (which requires a dozen pages of exposition). Nonetheless, it contains side remarks of true interest, including some suggestions I had not previously seen, as for instance an unusual introduction of the HMC algorithm as an underdamped Langevin diffusion. Chapter 2 prolongates this recap by covering reversible MCMC algorithms and the attached optimal scalings. This is done in a particularly friendly presentation that I intend to use in my own course. The HMC section is probably the best coverage I have seen on the topic, including most naturally the leapfrog steps.

Chapter 3 gets into stochastic gradient MCMC as an approximate MCMC, with nice arguments and formal convergence bounds. Again quite efficiently, if focussing almost solely on Gaussian settings (but including a neural network example). Similarly, Chapter 4 provides intuitive (if informal) arguments on the worth of non-reversible algorithms that are well-suited to a textbook of this level. This chapter introduces a PDMP sampler like the discrete bouncy particle sampler.

Chapter 5 is a (nicely) monstrous coverage of continuous time MCMC samplers that reaches very recent advances on PDMPs. The focus is on expressing them as limits, in order to derive mixing rates without extreme mathematical steps. (The chapter even includes a mention to the coordinate sampler that my PhD student Wu Changye derived in 2018!) Again a chapter I plan to use when teaching MCM methods, if possibly skipping some of the 66 pages.

Chapter 6 completes the monograph with a presentation of convergence assessment tools and diagnostics, exploiting the kernel trick, as well as convergence bounds that reflect very recent research in that domain. The conclusive section on optimal weights and optimal thinning will presumably be new to most readers. (Making me wonder if a link can be found with our importance Markov chain construct.)

[Disclaimer about potential self-plagiarism as usual: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

BayesComp 2025.4

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 21, 2025 by xi'an

The third and final day of the (main) conference started tih Emtiyaz Khan’s plenary talk on adaptive Bayesian intelligence. Or, imho, [adaptive [Bayesian]] intelligence, with the brackets indicating redundancy since intelligence need include adaptivity and [intelligent] adaptivity need proceed in a Bayesian way! Focussing first on the Bayesian learning rule via variational Bayes (with a stress on Kingma’s 1994 Adam optimisation algorithm, the “most cited paper” [in machine learning]) where learning boils down to gradient steps (due to the exponential family structure), themselves versions of Taylor (or Laplace) approximations). With an interesting vision of Bayesian updating as accounting for prediction mismatch. (I missed the connection Roberta in IMDb appearing in one slide!)

 The following session offered no dilemma [sorry, Alex, Axel, Chris, Robert, Sumeet, Victor!] since it included the federated learning session I organised, with Louis Asslet, Conor Hassan, and Jean-Michel Marin as speakers. Louis’ talk was on confidential [homomorphic] accept-reject algorithms to learn from other sources, while preserving (differential?) privacy, part of which came during Les Houches workshops I organised this Spring and the one before. Exploiting the additive features of log-likelihoods and exponential variates and adopting a testing perspective on privacy. Conor motivated his model with the Australian cancer atlas project Kerrie Mengersen and others have been developing over the years. The federated approach relies on variational approximations that return the same answer as an exact resolution, but more efficiently. (From a privacy perspective, I wonder at the impact of variational approximations on protecting the data, which boils down to a choice of (sufficient) statistics for the exponential families behind those approximations.) For more complicated models incorporating spatial dependence prohibits full Bayesian inference, unfortunately. Jean-Michel commented on the richness of methods for simulation-based inference, incl. model choice. His focus was on using sequential neural likelihood estimation and sequential importance sampling to approximate evidence. As in the Read Paper of Del Moral et al. (2006). Mentioning a neural version of the harmonic mean estimator by Spurio Mancini et al.  (2023)! I wondered at the degree of (Rao-Blackwell) recycling involved in the computation, Jean-Michel’s answer being that AMIS is soon coming [in a theatre near you!].

The afternoon sessions did offer any reprieve in the choice of topic! I first went to Approximate Methods for Accelerated Sampling, with Rong Tang evaluating the informativeness of summary statistics through a divergence evaluation. Using autoencoders to replace the intractable posterior, with sliced minimal model discrepancy (MMD) and (pseudo?) score matching loss for divergences (reminding me of indirect inference and synthetic likelihood). Yun Yang discussed a variational proposal to estimate the number of components in a mixture model. Surprising given the multimodal structure of mixture posteriors. And the overall irregularity of (evil!) mixture models. But I could not figure out from the talk the form of the approximation.

On the food scene, tasted a nice and spicy Peranakan rice vermicelli dish called Mee Siam yesterday in a campus restaurant, which sustained me fore the rest of the day, including the ABC s/webinar. And another spicy hot pot today at NUS, to catch up on veggies, while missing the chili crab local specialty on that trip.

BayesComp 2025.3

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 20, 2025 by xi'an

The second day of the conference started with a cooler and less humid weather (although this did not last!), although my brain felt a wee bit foggy from a lack of sleep (and I almost crashed while running on the hotel treadmill, at 14.5km/h!), and the plenary talk of my friend of many years Sylvia Früwirth-Schnatter on horseshoe priors and time-varying time series (à la West). With a nice closed-form representation involving hypergeometric functions of the second kind (my favourite!), with the addition of a triple-Gamma prior. Sylvia stressed on the enormous impact of the prior choice on change-point detection, which was already the point in the original horseshoe paper (as opposed to George’s Lasso prior). Without incorporating any specific modelling on potential change-point, fair enough given that the parameter is moving with time, unhindered. Her MCMC choices involved discrete parameters with Negative Binomial and Poisson parameters, allowing for partially integrated or collapsed solutions. Possibly further improved by Swendsen-Wang steps.

I then attended the (advanced) Langevin session after agonising upon my choice for a wealth of options! Sam Power presented a talk linking simulation with optimisation targets, over measure spaces. With Wasserstein gradient flow algorithms that resemble Langevin algorithms once discretised by a particle system. (A natural resolution producing a somewhat unnatural form of measure estimator since made of Dirac masses, from which very little can be learned.) Then [my Warwick colleague & coauthor] Any Wang on underdamped Langevin diffusions. when Poincaré‘s inequality fails, but convergence (in total variation) still occurs. Followed by Peter Whalley on splitting methods (where random hypergeometric subsampling dominates Robbins-Monro) and stochastic gradient algorithms, in a connected (to the previous talks) way since involving underdamped aspects. (With a personal discovery of Polyak’s heavy ball method.)

The afternoon session saw me facing a terrible dilemma with three close friends talking at the same time! Eventually opting for PDMPs, over simulation-based inference and recalibration for approximate Bayesian methods. Kengo Kamatani gave a general introduction to PDMPs, before explaining the automated implementation he considered with Charly Andral (during Charly’s visit to ISM, Tokyo, two summers ago). Towards accelerating the generation of the jump time. Then Luke Hardcastle applied PDMPs for survival prediction, using spike & slab priors and sticky PDMPs. And Jere Koskela (formerly Warwick) extended zig-zag sampling to discrete settings (incl. Kingman’s coalescent.)

The (rather long) day was not over yet since we had planned an extra on-site OWABI seminar & webinar with two participants in the conference, Filippo Pagani (Warwick and OCEAN postdoc) using fusion for federated learning, with a trapezoidal approximation, and Maurizio Filippone on GANs as hidden perfect ABC model selection, a GAN providing an automatic density estimator… With astounding Gemini-generated cartoons! Videos are soon to be available. A big congrats to the speakers who managed to convey their ideas and results despite the late hour! (On the extra-academic side, I was invited last night to a genuine Szechuan dinner in Chinatown, with a large array of spicy dishes if not that spicy!, and a rare opportunity to taste abalone. And bullfrogs. Quite a treat! And a good reason to skip dinner altogether!)

BayesComp 2025.2

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 19, 2025 by xi'an


The main BayesComp²⁵ conference started with Pierre Jacob’s plenary talk on his recent advances on coupling for unbiased MCMC—currently ERC grantee on that topic—.  Raising lazy questions like using a different target or transition kernel for the second chain in the coupling, connecting the Poisson equation and control variates, handling the signed issue with the unbiased approximations. Interestingly, they obtain an unbiased estimator of the asymptotic variance of the unbiased estimator. And a correction for self-normalised importance sampling, which has some connections with our 1996 (?) pinball sampler. Also an evaluation of the median of means, rather than the average of means, which is a thing I had been (lazily) contemplating for a while  (On the greedy side, as I was writing my recovery exam for my Monte Carlo course, I realised the results Pierre presented could be somewhat recycled into exam problems!)

My first parallel session was on gradient-based methods with a talk by Francesca Crucinio on proximal particle Langevin algorithms (similar to the one she gave in PariSanté last year), a talk by Zhihao Wang on stereographic multiple try Metropolis(-Rosenbluth-Teller) that unsurprisingly recovers ergodicity thanks to the compactness of the ball. For which I wonder why a Normal proposal makes complete sense since one could consider a mover after the projection instead and why iid rather than repelling multiple proposals are used… The last speaker was just out from the plane from California, Siddharth Vishwanath who spoke about repelling-attracting HMC. With very nice animations of HMC, if reaching the main point of using both negative and positive frictions a few minutes before the session finished. The method preserves volume and potential, if not energy.

Speaking of which (energy), I find myself struggling with my less than 6 hours of sleep since arrival during the first afternoon session, despite a fiery hot spot lunch, which means in plainer terms that I alas dozed in and out of the talks. The second session saw Jack Jewson exposing in deeper details the exact PDMP algorithm for Gibbs measures  Jeremias Knoblauch mentioned yesterday. And Jonathan Huggins as well, using Gaussian processes as proxies for expected likelihoods, with lower guarantees than pseudo-marginal versions. In a mildly connected way, Robin Ryder went through the resolution of the ecological inference challenge they produce with Nicolas Chopin and Théo Valdoire (all authors with whom I am connected, Théo being a brillant student of our MASH Master last year and now in Harvard, hopefully till the end of his PhD!)

On the extra-academic curriculum, I had a yummy dinner in the Maxwell Hawker (street) food centre, incl. Xiao Long Bao that cooled down fast enough to avoid the usual scalding effect, plus rojak a mixed fruit and vegetable fried in a peanut sauce that I had never tasted before, popiah (ditto), chili noodles, and an appam with durian deepfried balls as a fabulous and unexpected dessert.