Archive for Columbia University

project 2025 in [brute] force

Posted in Books, Travel, University life with tags , , , , , , , , , , , , , , , , , on April 29, 2025 by xi'an

proximal sampler

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on April 28, 2025 by xi'an

At the Columbia workshop last week, Andre Wibisono presented work related with a recent arXival on the exponentially fast convergence of both unadjusted Langevin and  proximal sampler algorithms under strong [definitely strong] log-concavity assumptions. The idea behind the proximal sampler is to target the demarginalised density

g(x,y) \propto \exp\{\log f(x) - ||x-y||^2/2\eta\}\quad\eta>0

by introducing an auxiliary Gaussian vector y, which preserves f(x) as the marginal distribution on the first component vector X. While the auxiliary Y is (obviously) conditionally Gaussian, the conditional of X is at least as challenging as simulating from f. Unless η is chosen small enough to regularize log g(x) into a strongly log-concave function, since

\log g(x,y) \le \log g(x^\star,y) -\beta||x-x^\star||^2

when x*=x*(y) is the maximiser of log g(x,y) (for a given value y) and β>0 is the appropriate log-concavity constant. This inequality means that an accept-reject can be implemented to simulate from the conditional of X given Y but it requires both the factor β and the derivation of x*(y), hence a pretty good understanding and a rather high regularity of the actual target f(x). Besides, the regularization term ||x-y||² means that y is approximately the previous value of the (sub)chain X, hence it creates a rappel force that slows down the exploration of the target.

Since the arXival does not contain numerical comparisons, I attempted one using the (2D) banana shaped distribution,

target=function(x,sig,B,mu)-x[1]^2/2/sig-(x[2]+B*x[1]^2-mu)^2/2

with μ=σ=B=10. Comparing with a vanilla random walk Metropolis with three potential scales, chosen randomly at each iteration. Since I did not want to check whether or not the target was log-concave (and derive the corresponding β), I used the Normal distribution centred at proposal x*(y) of a Metropolis step, again with several scales. The following is the representation of the samples (sienna for MCMC, navy blue for proximal with β=50, dark green for β=5), with a lesser rate of tail exploration for the proximal samplers. It is thus unclear to me the theoretical characterisations of the method translate into practical efficiency beyond the most regular cases.

science inverse [cover]

Posted in pictures, Travel, University life with tags , , , , , , , , , , , , , on April 10, 2025 by xi'an

gradient flow for projected Langevin dynamics

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on April 7, 2025 by xi'an

Daniel Lacker (Columbia U) gave a talk at the probability seminar of Paris Dauphine this week which I happened to attend by happenstance, on a recent paper, Projected Langevin dynamics and a gradient flow for entropic optimal transport, written with Giovanni Conforti and, Soumik Pal. The talk was quite progressive and I hence could follow most of it. The core idea is in studying Langevin-type diffusion dynamics that sample from an entropy-regularized optimal transport, i.e. looking for an optimal distribution (in the sense of achieving entropy minimisation problem within a Wasserstein space, with regularisation) obtained via a gradient flow equation (as eg in variational inference) that couples two SDEs that are recentred by conditional expectation terms. Expectations in the equations are estimated by a Nadaraya-Watson estimate in optimal transport problem (reminding me of SMC), with no theoretical derivation of an optimal bandwidth, and they achieve quantitive bounds on the convergence, namely for exponential convergence, energy decay and new logarithmic Sobolev inequalities. From the talk and a quick glance at the paper, it is unclear to me there are direct algorithmic consequences, since the SDEs need be discretised, while the expectation approximations are costly, being repeated at each iteration of the discretised SDE.

off to New York [Optimization & Statistical Learning workshop]

Posted in pictures, Statistics, Travel, University life with tags , , , , , on April 2, 2025 by xi'an