
Archive for Columbia University
project 2025 in [brute] force
Posted in Books, Travel, University life with tags @ScientistTrump, burner, climate science, Columbia University, conservatism, COVID-19, Donald Trump, Harvard University, Heritage Foundation, HIV, immigration, immigration crackdown, Nature, Project 2025, renewable energy, Trump administration, U.S. Customs and Border Protection, United States of America on April 29, 2025 by xi'an
proximal sampler
Posted in Books, pictures, R, Statistics, Travel, University life with tags banana, Bayesian optimization, Columbia University, log-concave functions, Metropolis-Hastings algorithm, New York, New York city, proximal sampler, random walk Metropolis algorithm, Statistical learning, ULA, unadjusted Langevin algorithm, workshop on April 28, 2025 by xi'an
At the Columbia workshop last week, Andre Wibisono presented work related with a recent arXival on the exponentially fast convergence of both unadjusted Langevin and proximal sampler algorithms under strong [definitely strong] log-concavity assumptions. The idea behind the proximal sampler is to target the demarginalised density
by introducing an auxiliary Gaussian vector y, which preserves f(x) as the marginal distribution on the first component vector X. While the auxiliary Y is (obviously) conditionally Gaussian, the conditional of X is at least as challenging as simulating from f. Unless η is chosen small enough to regularize log g(x) into a strongly log-concave function, since
when x*=x*(y) is the maximiser of log g(x,y) (for a given value y) and β>0 is the appropriate log-concavity constant. This inequality means that an accept-reject can be implemented to simulate from the conditional of X given Y but it requires both the factor β and the derivation of x*(y), hence a pretty good understanding and a rather high regularity of the actual target f(x). Besides, the regularization term ||x-y||² means that y is approximately the previous value of the (sub)chain X, hence it creates a rappel force that slows down the exploration of the target.
Since the arXival does not contain numerical comparisons, I attempted one using the (2D) banana shaped distribution,
target=function(x,sig,B,mu)-x[1]^2/2/sig-(x[2]+B*x[1]^2-mu)^2/2
with μ=σ=B=10. Comparing with a vanilla random walk Metropolis with three potential scales, chosen randomly at each iteration. Since I did not want to check whether or not the target was log-concave (and derive the corresponding β), I used the Normal distribution centred at proposal x*(y) of a Metropolis step, again with several scales. The following is the representation of the samples (sienna for MCMC, navy blue for proximal with β=50, dark green for β=5), with a lesser rate of tail exploration for the proximal samplers. It is thus unclear to me the theoretical characterisations of the method translate into practical efficiency beyond the most regular cases.
science inverse [cover]
Posted in pictures, Travel, University life with tags @ScientistTrump, Albert Einstein, anti-scientism, arbitrariness, baby Trump, climatosceptic, Columbia University, cover, firing, Food and Drug Administration, Libé, National Institute of Health, Trump administration, US politics on April 10, 2025 by xi'angradient flow for projected Langevin dynamics
Posted in Books, Statistics, University life with tags Columbia University, coupling, gradient flow, Langevin diffusion, metro station, New York city, New York metro, NYC, optimal transport, SDEs, seminar, SMC, unadjusted Langevin algorithm, Université Paris Dauphine, variational inference on April 7, 2025 by xi'an
Daniel Lacker (Columbia U) gave a talk at the probability seminar of Paris Dauphine this week which I happened to attend by happenstance, on a recent paper, Projected Langevin dynamics and a gradient flow for entropic optimal transport, written with Giovanni Conforti and, Soumik Pal. The talk was quite progressive and I hence could follow most of it. The core idea is in studying Langevin-type diffusion dynamics that sample from an entropy-regularized optimal transport, i.e. looking for an optimal distribution (in the sense of achieving entropy minimisation problem within a Wasserstein space, with regularisation) obtained via a gradient flow equation (as eg in variational inference) that couples two SDEs that are recentred by conditional expectation terms. Expectations in the equations are estimated by a Nadaraya-Watson estimate in optimal transport problem (reminding me of SMC), with no theoretical derivation of an optimal bandwidth, and they achieve quantitive bounds on the convergence, namely for exponential convergence, energy decay and new logarithmic Sobolev inequalities. From the talk and a quick glance at the paper, it is unclear to me there are direct algorithmic consequences, since the SDEs need be discretised, while the expectation approximations are costly, being repeated at each iteration of the discretised SDE.

