Archive for Langevin diffusion

mostly Monte Carlo [13/03]

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on March 10, 2026 by xi'an

A new episode of our mostly Monte Carlo seminar, very soon coming near you (if in Paris):

On Friday 13/02/26, from 3-5pm at PariSanté Campus

15h00: Pierre Del Moral (INRIA, Bordeaux)

On the Kantorovich contraction of Markov semigroup

We present a novel operator theoretic framework to study the contraction properties of Markov semigroups with respect to a general class of Kantorovich semi-distances, which notably includes Wasserstein distances. This rather simple contraction cost framework combines standard Lyapunov techniques with local contraction conditions. Our results can be applied to both discrete time and continuous time Markov semigroups, and we illustrate their wide applicability in the context of (i) Markov transitions on models with boundary states, including bounded domains with entrance boundaries, (ii) operator products of a Markov kernel and its adjoint, including two-block-type Gibbs samplers, (iii) iterated random functions and (iv) diffusion models, including overdampted Langevin diffusion with convex at infinity potentials.

16h00: Bob Carpenter (Flatiron Institute, New York)

GIST, WALNUTS, and Continuous Nutpie: mass-matrix and step-size adaptation for Hamiltonian Monte Carlo

I will introduce Gibbs self tuning (GIST), our new technique for coupling tuning parameters and conditionally Gibbs-sampling them per iteration in Hamiltonian Monte Carlo. Then I will turn to the within-orbit adaptive NUTS (WALNUTS) sampler, which adapts the step size every leapfrog step in order to conserve the Hamiltonian. Empirical evaluations on varying multi-scale target distributions, including Neal’s funnel and the Stock-Watson stochastic volatility time-series model, demonstrate that WALNUTS achieves substantial improvements in sampling efficiency and robustness. I will review the Nutpie mass-matrix adaptation scheme, which is designed to minimize Fisher divergence by estimating the mass matrix as the geometric midpoint (aka barycenter) between the inverse covariance of the draws and the covariance of the scores of the draws. Then I will describe a continuously adapting version that adapts per iteration by continuously discounting the past rather than updating in fixed blocks. I will also show how the Adam optimizer outperforms dual averaging for step-size adaptation. I will conclude by considering a lock-free multi-threading implementation that automatically monitors adaptation and sampling for convergence for automatic stopping.

scalable Monte Carlo for Bayesian learning [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on September 26, 2025 by xi'an

This book by Paul Fearnhead, Christopher Nemeth, Chris Oates, and Chris Sherlock is part of the IMS Monograph series. And published by Cambridge University Press. It covers most recent developments in MCMC methods, namely stochastic gradient MCMC (Chap. 3), non-reversible MCMC (Chap. 4), continuous-time MCMC (Chap. 5), and assessing and improving MCMC (Chap. 6). I find the book remarkable in its attention to rigour and clarity, without falling into overly technical derivations. It is perfectly suited for a graduate course to students with a solid mathematical background. In short, had I considered a new edition of our Monte Carlo Statistical Methods book to incorporate these advances, I could not done such a good job!

The first chapter provides a quick refresher of the background, from Monte Carlo principles, to Markov chains, SDEs, and the kernel “trick” (which requires a dozen pages of exposition). Nonetheless, it contains side remarks of true interest, including some suggestions I had not previously seen, as for instance an unusual introduction of the HMC algorithm as an underdamped Langevin diffusion. Chapter 2 prolongates this recap by covering reversible MCMC algorithms and the attached optimal scalings. This is done in a particularly friendly presentation that I intend to use in my own course. The HMC section is probably the best coverage I have seen on the topic, including most naturally the leapfrog steps.

Chapter 3 gets into stochastic gradient MCMC as an approximate MCMC, with nice arguments and formal convergence bounds. Again quite efficiently, if focussing almost solely on Gaussian settings (but including a neural network example). Similarly, Chapter 4 provides intuitive (if informal) arguments on the worth of non-reversible algorithms that are well-suited to a textbook of this level. This chapter introduces a PDMP sampler like the discrete bouncy particle sampler.

Chapter 5 is a (nicely) monstrous coverage of continuous time MCMC samplers that reaches very recent advances on PDMPs. The focus is on expressing them as limits, in order to derive mixing rates without extreme mathematical steps. (The chapter even includes a mention to the coordinate sampler that my PhD student Wu Changye derived in 2018!) Again a chapter I plan to use when teaching MCM methods, if possibly skipping some of the 66 pages.

Chapter 6 completes the monograph with a presentation of convergence assessment tools and diagnostics, exploiting the kernel trick, as well as convergence bounds that reflect very recent research in that domain. The conclusive section on optimal weights and optimal thinning will presumably be new to most readers. (Making me wonder if a link can be found with our importance Markov chain construct.)

[Disclaimer about potential self-plagiarism as usual: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

gradient flow for projected Langevin dynamics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on April 7, 2025 by xi'an

Daniel Lacker (Columbia U) gave a talk at the probability seminar of Paris Dauphine this week which I happened to attend by happenstance, on a recent paper, Projected Langevin dynamics and a gradient flow for entropic optimal transport, written with Giovanni Conforti and, Soumik Pal. The talk was quite progressive and I hence could follow most of it. The core idea is in studying Langevin-type diffusion dynamics that sample from an entropy-regularized optimal transport, i.e. looking for an optimal distribution (in the sense of achieving entropy minimisation problem within a Wasserstein space, with regularisation) obtained via a gradient flow equation (as eg in variational inference) that couples two SDEs that are recentred by conditional expectation terms. Expectations in the equations are estimated by a Nadaraya-Watson estimate in optimal transport problem (reminding me of SMC), with no theoretical derivation of an optimal bandwidth, and they achieve quantitive bounds on the convergence, namely for exponential convergence, energy decay and new logarithmic Sobolev inequalities. From the talk and a quick glance at the paper, it is unclear to me there are direct algorithmic consequences, since the SDEs need be discretised, while the expectation approximations are costly, being repeated at each iteration of the discretised SDE.

scalable Langevin exact algorithm [armchair Read Paper]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on June 26, 2020 by xi'an

So, Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts presented their Read Paper with discussions on the Wednesday aft! With a well-sized if virtual audience of nearly a hundred people. Here are a few notes scribbled during the Readings. And attempts at keeping the traditional structure of the meeting alive.

In their introduction, they gave the intuition of a quasi-stationary chain as the probability to be in A at time t while still alice as π(A) x exp(-λt) for a fixed killing rate λ. The concept is quite fascinating if less straightforward than stationarity! The presentation put the stress on the available recourse to an unbiased estimator of the κ rate whose initialisation scaled as O(n) but allowed a subsampling cost reduction afterwards. With a subsampling rat connected with Bayesian asymptotics, namely on how quickly the posterior concentrates. Unfortunately, this makes the practical construction harder, since n is finite and the concentration rate is unknown (although a default guess should be √n). I wondered if the link with self-avoiding random walks was more than historical.

The initialisation of the method remains a challenge in complex environments. And hence one may wonder if and how better it does when compared with SMC. Furthermore, while the motivation for using a Brownian motion stems from the practical side, this simulation does not account for the target π. This completely blind excursion sounds worse than simulating from the prior in other settings.

One early illustration for quasi stationarity was based on an hypothetical distribution of lions and wandering (Brownian) antelopes. I found that the associated concept of soft killing was not necessarily well received by …. the antelopes!

As it happens, my friend and coauthor Natesh Pillai was the first discussant! I did no not get the details of his first bimodal example. But he addressed my earlier question about how large the running time T should be. Since the computational cost should be exploding with T. He also drew a analogy with improper posteriors as to wonder about the availability of convergence assessment.

And my friend and coauthor Nicolas Chopin was the second discussant! Starting with a request to… leave the Pima Indians (model)  alone!! But also getting into a deeper assessment of the alternative use of SMCs.

scalable Langevin exact algorithm [Read Paper]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on June 23, 2020 by xi'an


Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts (CoI: all with whom I have strong professional and personal connections!) have a Read Paper discussion happening tomorrow [under relaxed lockdown conditions in the UK, except for the absurd quatorzine on all travelers|, but still in a virtual format] that we discussed together [from our respective homes] at Paris Dauphine. And which I already discussed on this blog when it first came out.

Here are quotes I spotted during this virtual Dauphine discussion but we did not come up with enough material to build a significant discussion, although wondering at the potential for solving the O(n) bottleneck, handling doubly intractable cases like the Ising model. And noticing the nice features of the log target being estimable by unbiased estimators. And of using control variates, for once well-justified in a non-trivial environment.

“However, in practice this simple idea is unlikely to work. We can see this most clearly with the rejection sampler, as the probability of survival will decrease exponentially with t—and thus the rejection probability will often be prohibitively large.”

“This can be viewed as a rejection sampler to simulate from μ(x,t), the distribution of the Brownian motion at time  t conditional on its surviving to time t. Any realization that has been killed is ‘rejected’ and a realization that is not killed is a draw from μ(x,t). It is easy to construct an importance sampling version of this rejection sampler.”