Archive for power posterior

BayesComp 2025.4

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 21, 2025 by xi'an

The third and final day of the (main) conference started tih Emtiyaz Khan’s plenary talk on adaptive Bayesian intelligence. Or, imho, [adaptive [Bayesian]] intelligence, with the brackets indicating redundancy since intelligence need include adaptivity and [intelligent] adaptivity need proceed in a Bayesian way! Focussing first on the Bayesian learning rule via variational Bayes (with a stress on Kingma’s 1994 Adam optimisation algorithm, the “most cited paper” [in machine learning]) where learning boils down to gradient steps (due to the exponential family structure), themselves versions of Taylor (or Laplace) approximations). With an interesting vision of Bayesian updating as accounting for prediction mismatch. (I missed the connection Roberta in IMDb appearing in one slide!)

 The following session offered no dilemma [sorry, Alex, Axel, Chris, Robert, Sumeet, Victor!] since it included the federated learning session I organised, with Louis Asslet, Conor Hassan, and Jean-Michel Marin as speakers. Louis’ talk was on confidential [homomorphic] accept-reject algorithms to learn from other sources, while preserving (differential?) privacy, part of which came during Les Houches workshops I organised this Spring and the one before. Exploiting the additive features of log-likelihoods and exponential variates and adopting a testing perspective on privacy. Conor motivated his model with the Australian cancer atlas project Kerrie Mengersen and others have been developing over the years. The federated approach relies on variational approximations that return the same answer as an exact resolution, but more efficiently. (From a privacy perspective, I wonder at the impact of variational approximations on protecting the data, which boils down to a choice of (sufficient) statistics for the exponential families behind those approximations.) For more complicated models incorporating spatial dependence prohibits full Bayesian inference, unfortunately. Jean-Michel commented on the richness of methods for simulation-based inference, incl. model choice. His focus was on using sequential neural likelihood estimation and sequential importance sampling to approximate evidence. As in the Read Paper of Del Moral et al. (2006). Mentioning a neural version of the harmonic mean estimator by Spurio Mancini et al.  (2023)! I wondered at the degree of (Rao-Blackwell) recycling involved in the computation, Jean-Michel’s answer being that AMIS is soon coming [in a theatre near you!].

The afternoon sessions did offer any reprieve in the choice of topic! I first went to Approximate Methods for Accelerated Sampling, with Rong Tang evaluating the informativeness of summary statistics through a divergence evaluation. Using autoencoders to replace the intractable posterior, with sliced minimal model discrepancy (MMD) and (pseudo?) score matching loss for divergences (reminding me of indirect inference and synthetic likelihood). Yun Yang discussed a variational proposal to estimate the number of components in a mixture model. Surprising given the multimodal structure of mixture posteriors. And the overall irregularity of (evil!) mixture models. But I could not figure out from the talk the form of the approximation.

On the food scene, tasted a nice and spicy Peranakan rice vermicelli dish called Mee Siam yesterday in a campus restaurant, which sustained me fore the rest of the day, including the ABC s/webinar. And another spicy hot pot today at NUS, to catch up on veggies, while missing the chili crab local specialty on that trip.

BayesComp 2025.3

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 20, 2025 by xi'an

The second day of the conference started with a cooler and less humid weather (although this did not last!), although my brain felt a wee bit foggy from a lack of sleep (and I almost crashed while running on the hotel treadmill, at 14.5km/h!), and the plenary talk of my friend of many years Sylvia Früwirth-Schnatter on horseshoe priors and time-varying time series (à la West). With a nice closed-form representation involving hypergeometric functions of the second kind (my favourite!), with the addition of a triple-Gamma prior. Sylvia stressed on the enormous impact of the prior choice on change-point detection, which was already the point in the original horseshoe paper (as opposed to George’s Lasso prior). Without incorporating any specific modelling on potential change-point, fair enough given that the parameter is moving with time, unhindered. Her MCMC choices involved discrete parameters with Negative Binomial and Poisson parameters, allowing for partially integrated or collapsed solutions. Possibly further improved by Swendsen-Wang steps.

I then attended the (advanced) Langevin session after agonising upon my choice for a wealth of options! Sam Power presented a talk linking simulation with optimisation targets, over measure spaces. With Wasserstein gradient flow algorithms that resemble Langevin algorithms once discretised by a particle system. (A natural resolution producing a somewhat unnatural form of measure estimator since made of Dirac masses, from which very little can be learned.) Then [my Warwick colleague & coauthor] Any Wang on underdamped Langevin diffusions. when Poincaré‘s inequality fails, but convergence (in total variation) still occurs. Followed by Peter Whalley on splitting methods (where random hypergeometric subsampling dominates Robbins-Monro) and stochastic gradient algorithms, in a connected (to the previous talks) way since involving underdamped aspects. (With a personal discovery of Polyak’s heavy ball method.)

The afternoon session saw me facing a terrible dilemma with three close friends talking at the same time! Eventually opting for PDMPs, over simulation-based inference and recalibration for approximate Bayesian methods. Kengo Kamatani gave a general introduction to PDMPs, before explaining the automated implementation he considered with Charly Andral (during Charly’s visit to ISM, Tokyo, two summers ago). Towards accelerating the generation of the jump time. Then Luke Hardcastle applied PDMPs for survival prediction, using spike & slab priors and sticky PDMPs. And Jere Koskela (formerly Warwick) extended zig-zag sampling to discrete settings (incl. Kingman’s coalescent.)

The (rather long) day was not over yet since we had planned an extra on-site OWABI seminar & webinar with two participants in the conference, Filippo Pagani (Warwick and OCEAN postdoc) using fusion for federated learning, with a trapezoidal approximation, and Maurizio Filippone on GANs as hidden perfect ABC model selection, a GAN providing an automatic density estimator… With astounding Gemini-generated cartoons! Videos are soon to be available. A big congrats to the speakers who managed to convey their ideas and results despite the late hour! (On the extra-academic side, I was invited last night to a genuine Szechuan dinner in Chinatown, with a large array of spicy dishes if not that spicy!, and a rare opportunity to taste abalone. And bullfrogs. Quite a treat! And a good reason to skip dinner altogether!)

BayesComp 2025.1

Posted in Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 18, 2025 by xi'an

Minus one day at BayesComp 2025! As I am attending the model misspecification satellite workshop (ten minutes late, due to repeated path finding protocol!), with an extended presentation by Jeremias Knoblauch on post-Bayesian inference, incl. powered likelihood and Gibbs posteriors. A very smooth and pedagogical presentation, esp. in the hybrid mode. A perspective I associate with the difficulties of making sense of the post-posterior, not truly a posterior, of calibrating the penalty (eg λ), picking the loss (α, β, γ divergences?) , and the drift towards learning goals since the new measure is the post-posterior predictive. Sort of paradoxical return to a Gaussian post-posterior on the parameter that does not seem to stay robust. Horrendous computational issues, when the loss itself is an integral. Use of the zig-zag sampler with an estimated unbiased gradient of the loss, much faster than pseudo-marginal, which (naïvely?) makes sense both because PDMPs directly use scores and because of the power of stochastic gradient methods. Worse perspectives for optimisation-centric posterior that are essentially vamped versions of GANs. For instance, what is the meaning of the coverage probabilities?

The second talk by Jonathan Huggins was on DC (not bagged) posteriors as martingale posteriors with m<∞ (approximating marginal distributions with random kernel MCMC—which persists in simulating the marginalised or integrated variable u from its prior, rather than adapting to the current value of the parameter θ— or subsampling MCMC akin to stochastic gradient Langevin) with connection with cut posteriors,

Then I skipped to the second workshop on Bayesian methods for distributional and semiparametric regression, to listen to my friend David Rossell’s talk on local variable selection. Which suffers more than in standard models under misspecification. Another talk involving cut posteriors, the cuts being on the spline bases…

The day and the workshop concluded with great talks by (my friends) Pierre Alquier and David Frazier. David centred his misspecification talk on cut posteriors. Managing to bring in shrinkage estimators (and mention Bill Strawderman!).

A wee stressful trip, since the races in Caen cancelled all buses and delayed the taxi enough to miss the train to Paris by 30s, catching the next available one leaving me less than one hour between the arrival of the train (delayed by construction work on the rail line) and boarding the flight at Charles de Gaulle airport, but fortunately the RER trains in Paris were running okay, there were no queues in the airport, and I thus made it in time with a bit of post-marathon jogging! (Only to be delayed at departure by one hour for stormy conditions over Germany and Austria). All this exercise proved helpful to sleep soundly and lengthily in the plane!

WBIC, practically

Posted in Statistics with tags , , , , , , , , , on October 20, 2017 by xi'an

“Thus far, WBIC has received no more than a cursory mention by Gelman et al. (2013)”

I had missed this 2015  paper by Nial Friel and co-authors on a practical investigation of Watanabe’s WBIC. Where WBIC stands for widely applicable Bayesian information criterion. The thermodynamic integration approach explored by Nial and some co-authors for the approximation of the evidence, thermodynamic integration that produces the log-evidence as an integral between temperatures t=0 and t=1 of a powered evidence, is eminently suited for WBIC, as the widely applicable Bayesian information criterion is associated with the specific temperature t⁰ that makes the power posterior equidistant, Kullback-Leibler-wise, from the prior and posterior distributions. And the expectation of the log-likelihood under this very power posterior equal to the (genuine) evidence. In fact, WBIC is often associated with the sub-optimal temperature 1/log(n), where n is the (effective?) sample size. (By comparison, if my minimalist description is unclear!, thermodynamic integration requires a whole range of temperatures and associated MCMC runs.) In an ideal Gaussian setting, WBIC improves considerably over thermodynamic integration, the larger the sample the better. In more realistic settings, though, including a simple regression and a logistic [Pima Indians!] model comparison, thermodynamic integration may do better for a given computational cost although the paper is unclear about these costs. The paper also runs a comparison with harmonic mean and nested sampling approximations. Since the integral of interest involves a power of the likelihood, I wonder if a safe version of the harmonic mean resolution can be derived from simulations of the genuine posterior. Provided the exact temperature t⁰ is known…

Greek variations on power-expected-posterior priors

Posted in Books, Statistics, University life with tags , , , , , , on October 5, 2016 by xi'an

Dimitris Fouskakis, Ioannis Ntzoufras and Konstantinos Perrakis, from Athens, have just arXived a paper on power-expected-posterior priors. Just like the power prior and the expected-posterior prior, this approach aims at avoiding improper priors by the use of imaginary data, which distribution is itself the marginal against another prior. (In the papers I wrote on that topic with Juan Antonio Cano and Diego Salmerón, we used MCMC to figure out a fixed point for such priors.)

The current paper (which I only perused) studies properties of two versions of power-expected-posterior priors proposed in an earlier paper by the same authors. For the normal linear model. Using a posterior derived from an unormalised powered likelihood either (DR) integrated in the imaginary data against the prior predictive distribution of the reference model based on the powered likelihood, or (CR) integrated in the imaginary data against the prior predictive distribution of the reference model based on the actual likelihood. The baseline model being the G-prior with g=n². Both versions lead to a marginal likelihood that is similar to BIC and hence consistent. The DR version coincides with the original power-expected-posterior prior in the linear case. The CR version involves a change of covariance matrix. All in all, the CR version tends to favour less complex models, but is less parsimonious as a variable selection tool, which sounds a wee bit contradictory. Overall, I thus feel (possibly incorrectly) that the paper is more an appendix to the earlier paper than a paper in itself as I do not get in the end a clear impression of which method should be preferred.