Archive for NUS

coupling-based approach to f-divergences diagnostics for MCMC

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on October 27, 2025 by xi'an

Adrien Corenflos (University of Warwick) and Hai-Dang Dau (NUS) just arXived their paper on MCMC diagnostics that Adrien told me about last month, while in Warwick.

“This [f-divergence] bound is clearly suboptimal since it does not vary in t and does not take into account the mixing of the Markov chain. We present a scheme where the weights are ‘harmonized’ as the Markov chain progresses, reflecting its mixing through the notion of coupling.”

They start by opposing the classical ergodic average and embarrassingly parallel estimates obtained by N parallel chains culled of their B initial values, to couplings used in standard diagnoses. Opting for the parallel perspective, maybe rekindling the diagnostic war of the early 1990s! The evaluation tool in the paper is based on f-divergences, like the χ² divergence which naturally relates to the effective sample size when considering weighted atomic measures. When consistent, these weighted approximations produce upper bounds on the f-divergence, with exact convergence in case of independence.

In my opinion the most exciting part of the paper stands with the ability to modify these weights along MCMC iterations, since the naïve sequential importance sampling argument I also use in class keeps them constant! The trick is to (be able to) couple randomly chosen parallel chains, with the weights being averaged at each coupling event. The resulting algorithm preserves expectation (in the importance sampling sense) and consistency (in the particle sense). Furthermore, the f-divergence bound based on the weights can only decrease between iterations, which reminds me of interleaving. And exponential convergence of the weights to uniform ones (under the strong assumption of a uniformly lower bounded probability of coupling). The paper concludes with interesting remarks on perfect sampling, Rao-Blackwellisation, control variates, and backward sampling.

A long-standing gap exists between the theoretical analysis of Markov chain Monte Carlo convergence, which is often based on statistical divergences, and the diagnostics used in practice. We introduce the first general convergence diagnostics for Markov chain Monte Carlo based on any f χ² -divergence, allowing users to directly monitor, among others, the Kullback–Leibler and the divergences as well as the Hellinger and the total variation distances. Our first key contribution is a coupling-based ‘weight harmonization’ scheme that produces a direct, computable, and consistent weighting of interacting Markov chains with respect to their target distribution. The second key contribution is to show how such consistent weightings of empirical measures can be used to provide upper bounds to f -divergences in general. We prove that these bounds are guaranteed to tighten over time and converge to zero as the chains approach stationarity, providing a concrete diagnostic.

BayesComp 2025.4

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 21, 2025 by xi'an

The third and final day of the (main) conference started tih Emtiyaz Khan’s plenary talk on adaptive Bayesian intelligence. Or, imho, [adaptive [Bayesian]] intelligence, with the brackets indicating redundancy since intelligence need include adaptivity and [intelligent] adaptivity need proceed in a Bayesian way! Focussing first on the Bayesian learning rule via variational Bayes (with a stress on Kingma’s 1994 Adam optimisation algorithm, the “most cited paper” [in machine learning]) where learning boils down to gradient steps (due to the exponential family structure), themselves versions of Taylor (or Laplace) approximations). With an interesting vision of Bayesian updating as accounting for prediction mismatch. (I missed the connection Roberta in IMDb appearing in one slide!)

 The following session offered no dilemma [sorry, Alex, Axel, Chris, Robert, Sumeet, Victor!] since it included the federated learning session I organised, with Louis Asslet, Conor Hassan, and Jean-Michel Marin as speakers. Louis’ talk was on confidential [homomorphic] accept-reject algorithms to learn from other sources, while preserving (differential?) privacy, part of which came during Les Houches workshops I organised this Spring and the one before. Exploiting the additive features of log-likelihoods and exponential variates and adopting a testing perspective on privacy. Conor motivated his model with the Australian cancer atlas project Kerrie Mengersen and others have been developing over the years. The federated approach relies on variational approximations that return the same answer as an exact resolution, but more efficiently. (From a privacy perspective, I wonder at the impact of variational approximations on protecting the data, which boils down to a choice of (sufficient) statistics for the exponential families behind those approximations.) For more complicated models incorporating spatial dependence prohibits full Bayesian inference, unfortunately. Jean-Michel commented on the richness of methods for simulation-based inference, incl. model choice. His focus was on using sequential neural likelihood estimation and sequential importance sampling to approximate evidence. As in the Read Paper of Del Moral et al. (2006). Mentioning a neural version of the harmonic mean estimator by Spurio Mancini et al.  (2023)! I wondered at the degree of (Rao-Blackwell) recycling involved in the computation, Jean-Michel’s answer being that AMIS is soon coming [in a theatre near you!].

The afternoon sessions did offer any reprieve in the choice of topic! I first went to Approximate Methods for Accelerated Sampling, with Rong Tang evaluating the informativeness of summary statistics through a divergence evaluation. Using autoencoders to replace the intractable posterior, with sliced minimal model discrepancy (MMD) and (pseudo?) score matching loss for divergences (reminding me of indirect inference and synthetic likelihood). Yun Yang discussed a variational proposal to estimate the number of components in a mixture model. Surprising given the multimodal structure of mixture posteriors. And the overall irregularity of (evil!) mixture models. But I could not figure out from the talk the form of the approximation.

On the food scene, tasted a nice and spicy Peranakan rice vermicelli dish called Mee Siam yesterday in a campus restaurant, which sustained me fore the rest of the day, including the ABC s/webinar. And another spicy hot pot today at NUS, to catch up on veggies, while missing the chili crab local specialty on that trip.

BayesComp 2025.3

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 20, 2025 by xi'an

The second day of the conference started with a cooler and less humid weather (although this did not last!), although my brain felt a wee bit foggy from a lack of sleep (and I almost crashed while running on the hotel treadmill, at 14.5km/h!), and the plenary talk of my friend of many years Sylvia Früwirth-Schnatter on horseshoe priors and time-varying time series (à la West). With a nice closed-form representation involving hypergeometric functions of the second kind (my favourite!), with the addition of a triple-Gamma prior. Sylvia stressed on the enormous impact of the prior choice on change-point detection, which was already the point in the original horseshoe paper (as opposed to George’s Lasso prior). Without incorporating any specific modelling on potential change-point, fair enough given that the parameter is moving with time, unhindered. Her MCMC choices involved discrete parameters with Negative Binomial and Poisson parameters, allowing for partially integrated or collapsed solutions. Possibly further improved by Swendsen-Wang steps.

I then attended the (advanced) Langevin session after agonising upon my choice for a wealth of options! Sam Power presented a talk linking simulation with optimisation targets, over measure spaces. With Wasserstein gradient flow algorithms that resemble Langevin algorithms once discretised by a particle system. (A natural resolution producing a somewhat unnatural form of measure estimator since made of Dirac masses, from which very little can be learned.) Then [my Warwick colleague & coauthor] Any Wang on underdamped Langevin diffusions. when Poincaré‘s inequality fails, but convergence (in total variation) still occurs. Followed by Peter Whalley on splitting methods (where random hypergeometric subsampling dominates Robbins-Monro) and stochastic gradient algorithms, in a connected (to the previous talks) way since involving underdamped aspects. (With a personal discovery of Polyak’s heavy ball method.)

The afternoon session saw me facing a terrible dilemma with three close friends talking at the same time! Eventually opting for PDMPs, over simulation-based inference and recalibration for approximate Bayesian methods. Kengo Kamatani gave a general introduction to PDMPs, before explaining the automated implementation he considered with Charly Andral (during Charly’s visit to ISM, Tokyo, two summers ago). Towards accelerating the generation of the jump time. Then Luke Hardcastle applied PDMPs for survival prediction, using spike & slab priors and sticky PDMPs. And Jere Koskela (formerly Warwick) extended zig-zag sampling to discrete settings (incl. Kingman’s coalescent.)

The (rather long) day was not over yet since we had planned an extra on-site OWABI seminar & webinar with two participants in the conference, Filippo Pagani (Warwick and OCEAN postdoc) using fusion for federated learning, with a trapezoidal approximation, and Maurizio Filippone on GANs as hidden perfect ABC model selection, a GAN providing an automatic density estimator… With astounding Gemini-generated cartoons! Videos are soon to be available. A big congrats to the speakers who managed to convey their ideas and results despite the late hour! (On the extra-academic side, I was invited last night to a genuine Szechuan dinner in Chinatown, with a large array of spicy dishes if not that spicy!, and a rare opportunity to taste abalone. And bullfrogs. Quite a treat! And a good reason to skip dinner altogether!)

BayesComp 2025.2

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 19, 2025 by xi'an


The main BayesComp²⁵ conference started with Pierre Jacob’s plenary talk on his recent advances on coupling for unbiased MCMC—currently ERC grantee on that topic—.  Raising lazy questions like using a different target or transition kernel for the second chain in the coupling, connecting the Poisson equation and control variates, handling the signed issue with the unbiased approximations. Interestingly, they obtain an unbiased estimator of the asymptotic variance of the unbiased estimator. And a correction for self-normalised importance sampling, which has some connections with our 1996 (?) pinball sampler. Also an evaluation of the median of means, rather than the average of means, which is a thing I had been (lazily) contemplating for a while  (On the greedy side, as I was writing my recovery exam for my Monte Carlo course, I realised the results Pierre presented could be somewhat recycled into exam problems!)

My first parallel session was on gradient-based methods with a talk by Francesca Crucinio on proximal particle Langevin algorithms (similar to the one she gave in PariSanté last year), a talk by Zhihao Wang on stereographic multiple try Metropolis(-Rosenbluth-Teller) that unsurprisingly recovers ergodicity thanks to the compactness of the ball. For which I wonder why a Normal proposal makes complete sense since one could consider a mover after the projection instead and why iid rather than repelling multiple proposals are used… The last speaker was just out from the plane from California, Siddharth Vishwanath who spoke about repelling-attracting HMC. With very nice animations of HMC, if reaching the main point of using both negative and positive frictions a few minutes before the session finished. The method preserves volume and potential, if not energy.

Speaking of which (energy), I find myself struggling with my less than 6 hours of sleep since arrival during the first afternoon session, despite a fiery hot spot lunch, which means in plainer terms that I alas dozed in and out of the talks. The second session saw Jack Jewson exposing in deeper details the exact PDMP algorithm for Gibbs measures  Jeremias Knoblauch mentioned yesterday. And Jonathan Huggins as well, using Gaussian processes as proxies for expected likelihoods, with lower guarantees than pseudo-marginal versions. In a mildly connected way, Robin Ryder went through the resolution of the ecological inference challenge they produce with Nicolas Chopin and Théo Valdoire (all authors with whom I am connected, Théo being a brillant student of our MASH Master last year and now in Harvard, hopefully till the end of his PhD!)

On the extra-academic curriculum, I had a yummy dinner in the Maxwell Hawker (street) food centre, incl. Xiao Long Bao that cooled down fast enough to avoid the usual scalding effect, plus rojak a mixed fruit and vegetable fried in a peanut sauce that I had never tasted before, popiah (ditto), chili noodles, and an appam with durian deepfried balls as a fabulous and unexpected dessert.

BayesComp 2025.1

Posted in Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 18, 2025 by xi'an

Minus one day at BayesComp 2025! As I am attending the model misspecification satellite workshop (ten minutes late, due to repeated path finding protocol!), with an extended presentation by Jeremias Knoblauch on post-Bayesian inference, incl. powered likelihood and Gibbs posteriors. A very smooth and pedagogical presentation, esp. in the hybrid mode. A perspective I associate with the difficulties of making sense of the post-posterior, not truly a posterior, of calibrating the penalty (eg λ), picking the loss (α, β, γ divergences?) , and the drift towards learning goals since the new measure is the post-posterior predictive. Sort of paradoxical return to a Gaussian post-posterior on the parameter that does not seem to stay robust. Horrendous computational issues, when the loss itself is an integral. Use of the zig-zag sampler with an estimated unbiased gradient of the loss, much faster than pseudo-marginal, which (naïvely?) makes sense both because PDMPs directly use scores and because of the power of stochastic gradient methods. Worse perspectives for optimisation-centric posterior that are essentially vamped versions of GANs. For instance, what is the meaning of the coverage probabilities?

The second talk by Jonathan Huggins was on DC (not bagged) posteriors as martingale posteriors with m<∞ (approximating marginal distributions with random kernel MCMC—which persists in simulating the marginalised or integrated variable u from its prior, rather than adapting to the current value of the parameter θ— or subsampling MCMC akin to stochastic gradient Langevin) with connection with cut posteriors,

Then I skipped to the second workshop on Bayesian methods for distributional and semiparametric regression, to listen to my friend David Rossell’s talk on local variable selection. Which suffers more than in standard models under misspecification. Another talk involving cut posteriors, the cuts being on the spline bases…

The day and the workshop concluded with great talks by (my friends) Pierre Alquier and David Frazier. David centred his misspecification talk on cut posteriors. Managing to bring in shrinkage estimators (and mention Bill Strawderman!).

A wee stressful trip, since the races in Caen cancelled all buses and delayed the taxi enough to miss the train to Paris by 30s, catching the next available one leaving me less than one hour between the arrival of the train (delayed by construction work on the rail line) and boarding the flight at Charles de Gaulle airport, but fortunately the RER trains in Paris were running okay, there were no queues in the airport, and I thus made it in time with a bit of post-marathon jogging! (Only to be delayed at departure by one hour for stormy conditions over Germany and Austria). All this exercise proved helpful to sleep soundly and lengthily in the plane!