Archive for STAN

Bayesian Workflow [cover]

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on April 25, 2026 by xi'an

Ah great, the new book on Bayesian workflow by Andrew Gelman, Aki Vehtari, Richard McElreath I knew they were working on is about to appear!  With entries from several coauthors and half of the chapters on case studies. I have not (yet) looked at its contents in detail…

amortized Bayesian mixture model

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on February 7, 2025 by xi'an

A few days before the January OWABI, I read through Simon Kucharsky’s and Paul Bürkner’s paper, arXived on 17 January. Which proposed an amortized Bayesian inference (ABI) method, even though the ABI is not the same as in OWABI! The motivation for their work is to start from a (standard) mixture model where the components are not analytically tractable (but still parameterised). But a generative model nonetheless. As in the earlier reviewed paper (which was arXived on the same day), by MEJ Newman, the dual representation of the joint posterior p(θ,z|x) as p(z|x,θ)p(θ|x) and p(θ|z,x)p(z|x) is (over?) emphasized (albeit unclearly why!). ABI uses neural networks and more specifically normalising flows to approximate the posterior p(θ|x) from prior predictive samples (θ,x) (as in ABC), and then directly exploit the invertibility of said flows to generate from this approximate posterior. One interesting aspect of the modelling is the derivation of summary statistics in the design of the network, albeit mixture posteriors do not allow for dimension-reduced (Bayes) sufficient statistics (and a contradictory sentence that conditioning on the summaries “does not alter the target posterior”, p7). The resulting approximate posterior generator proves much much faster than running an MCMC, obviously, and furthermore adapt to handling a sequence of datasets. A second network is constructed to approximate p(z|x,θ), using the same summaries. The network parameters are estimated through losses, rather than in a Bayesian manner, with a default Kullback-Leibler version (18). I also fail to understand why the networks are trained over unconstrained parameters when all parameters could become unconstrained when using the adequate parameterisation. And am fairly surprised at the regression towards the ill-fated step of using ordered parameters to avoid label switching… But the main quandary remains the issue of assessing the approximation effect, despite experiments aiming at pacifying such worries. And similarities with Stan and BayesFlow.

scalability of Metropolis-within-Gibbs schemes

Posted in Books, Statistics, University life with tags , , , , , , on July 17, 2024 by xi'an

My friends Filipo Ascolani, Gareth Roberts, and Giacomo Zanella recently arXived a paper on the scalability (in the dimension) of Gibbs and Metropolis-within-Gibbs sampling schemes. Which is celebrating a sort of return of the Gibbs sampler as a dimension resistant device (when compared with other solutions), witness the following extract:

“….we provide bounds on the approximate conductance of a generic coordinate-wise scheme in terms of the corresponding quantity for the Gibbs sampler. Working with the approximate version of the conductance is crucial for our purposes and subsequent applications. The general theory naturally applies to Metropolis-within-Gibbs schemes, such as those targeting conditionally log-concave distributions. In the second part, we analyze performances of coordinate-wise samplers for relevant statistical applications, combining the bounds discussed above with specific model properties, statistical asymptotics and some novel auxiliary results on approximate conductances and perturbation of Markov operators. Much emphasis is placed on coordinate-wise schemes for generic two-levels hierarchical models with non-conjugate likelihood for which we are able to prove dimension-free behaviour of total variation mixing times, under warm and feasible starts.” F. Ascolani, G.O. Roberts, and G. Zanella

Here, M -warm starts meaning a starting measure bounded by the target, i.e., not too far in the tails, while conductance Φ is a measure related with the probability that the Markov chain exits an arbitrary set A in one step, given that it starts from the target π restricted to A. The paper quantifies the loss of efficiency incurred by substituting an exact Gibbs update with a π¹-invariant one, e.g. M-within-G, that is

\Phi_s(P)\ge\min_i\kappa(P_i, X)\Phi_s(G)

following from

G_i(\partial A)\ge P_i(\partial A)\ge \kappa_i(P_i, X)G_i(\partial A)

In the (rather unrealistic) case of an independent Metropolis-within-Gibbs proposal enjoying an upper bound M on the Radon-Nykodym derivative between target and kernel, the conductance of Metropolis-within-Gibbs is at least one M-th of the conductance of Gibbs, ie a constant slowdown relative to exact Gibbs if the dimensionality is fixed but arbitrary.

The paper further studies a hierarchical Bayes model when the number J of groups goes to infinity and only top (of the hierarchy) parameter is of interest. In that setting, only two requirements need be satisfied for the Metropolis-within-Gibbs kernel P to mix fast: namely that the Gibbs kernel G mixes fast and  that the conditional conductance of P around true ψ is good enough. A further point of relevance is the demonstrated O(J) computational cost, ie the Metropolis-within-Gibbs algorithm with kernel P produces a sample with ϵ-accuracy in TV distance with O(J) cost when initialized from a warm start, a better magnitude than alternatives like the Metropolis-Adjusted Langevin (MALA) and the Hamiltonian Monte Carlo (HMC) algorithms. When checking for connections with other papers, I came across the nearly completed book by Sinho Chewi on long-concave sampling, which seems to be exploring similar ground.

 

statistical modeling with R [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on June 10, 2023 by xi'an

Statistical Modeling with R (A dual frequentist and Bayesian approach for life scientists) is a recent book written by Pablo Inchausti, from Uruguay. In a highly personal and congenial style (witness the preface), with references to (fiction) books that enticed me to buy them. The book was sent to me by the JASA book editor for review and I went through the whole of it during my flight back from Jeddah. [Disclaimer about potential self-plagiarism: this post or a likely edited version of it will eventually appear in JASA. If not CHANCE, for once.]

The very first sentence (after the preface) quotes my late friend Steve Fienberg, which is definitely starting on the right foot. The exposition of the motivations for writing the book is quite convincing, with more emphasis than usual put on the notion and limitations of modeling. The discourse is overall inspirational and contains many relevant remarks and links that make it worth reading it as a whole. While heavily connected with a few R packages like fitdist, fitistrplus, brms (a  front for Stan), glm, glmer, the book is wisely bypassing the perilous reef of recalling R bases. Similarly for the foundations of probability and statistics. While lacking in formal definitions, in my opinion, it reads well enough to somehow compensate for this very lack. I also appreciate the coherent and throughout continuation of the parallel description of Bayesian and non-Bayesian analyses, an attempt that often too often quickly disappear in other books. (As an aside, note that hardly anyone claims to be a frequentist, except maybe Deborah Mayo.) A new model is almost invariably backed by a new dataset, if a few being somewhat inappropriate as in the mammal sleep patterns of Chapter 5. Or in Fig. 6.1.

Given that the main motivation for the book (when compared with references like BDA) is heavily towards the practical implementation of statistical modelling via R packages, it is inevitable that a large fraction of Statistical Modeling with R is spent on the analysis of R outputs, even though it sometimes feels a wee bit too heavy for yours truly.  The R screen-copies are however produced in moderate quantity and size, even though the variations in typography/fonts (at least on my copy?!) may prove confusing. Obviously the high (explosive?) distinction between regression models may eventually prove challenging for the novice reader. The specific issue of prior input (or “defining priors”) is briefly addressed in a non-chapter (p.323), although mentions are made throughout preceding chapters. I note the nice appearance of hierarchical models and experimental designs towards the end, but would have appreciated some discussions on missing topics such as time series, causality, connections with machine learning, non-parametrics, model misspecification. As an aside, I appreciated being reminded about the apocryphal nature of Ockham’s much cited quotePluralitas non est ponenda sine necessitate“.

Typo Jeffries found in Fig. 2.1, along with a rather sketchy representation of the history of both frequentist and Bayesian statistics. And Jon Wakefield’s book (with related purpose of presenting both versions of parametric inference) was mistakenly entered as Wakenfield’s in the bibliography file. Some repetitions occur. I do not like the use of the equivalence symbol ≈ for proportionality. And I found two occurrences of the unavoidable “the the” typo (p.174 and p.422). I also had trouble with some sentences like “long-run, hypothetical distribution of parameter estimates known as the sampling distribution” (p.27), “maximum likelihood estimates [being] sufficient” (p.28), “Jeffreys’ (1939) conjugate priors” [which were introduced by Raiffa and Schlaifer] (p.35), “A posteriori tests in frequentist models” (p.130), “exponential families [having] limited practical implications for non-statisticians” (p.190), “choice of priors being correct” (p.339), or calling MCMC sample terms “estimates” (p.42), and issues with some repetitions, missing indices for acronyms, packages, datasets, but did not bemoan the lack homework sections (beyond suggesting new datasets for analysis).

A problematic MCMC entry is found when calibrating the choice of the Metropolis-Hastings proposal towards avoiding negative values “that will generate an error when calculating the log-likelihood” (p.43) since it suggests proposed values should not exceed the support of the posterior (and indicates a poor coding of the log-likelihood!). I also find the motivation for the full conditional decomposition behind the Gibbs sampler (p.47) unnecessarily confusing. (And automatically having a Metropolis-Hastings step within Gibbs as on Fig. 3.9 brings another magnitude of confusion.) The Bayes factor section is very terse. The derivation of the Kullback-Leibler representation (7.3) as an expected log likelihood ratio seems to be missing a reference measure. Of course, seeing a detailed coverage of DIC (Section 7.4) did not suit me either, even though the issue with mixtures was alluded to (with no detail whatsoever). The Nelder presentation of the generalised linear models felt somewhat antiquated, since the addition of the scale factor a(φ) sounds over-parameterized.

But those are minor quibble in relation to a book that should attract curious minds of various background knowledge and expertise in statistics, as well as work nicely to support an enthusiastic teacher of statistical modelling. I thus recommend this book most enthusiastically.

StanCon 2023 [20-23 June 2023]

Posted in Statistics with tags , , , , , , on April 8, 2023 by xi'an