Archive for reference priors

on(-line) integral priors for model selection

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on February 27, 2026 by xi'an

integral priors for model comparison [2.0]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on May 2, 2025 by xi'an

Objective Bayesian Inference [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on July 2, 2024 by xi'an

As advertised earlier on the ‘Og, the reference book on reference priors and relatives by my long-time friends Jim Berger, José Bernardo, and Dongchu Sun is at last out! I received a copy from the editor, World Scientific, and read through it, mostly in train rides to Normandy and Brittany. The construction of this book took decades and I remember many O’Bayes meetings when we were discussing of the progress made that far. As I knew from a few months back that the book was at last completed, I was quite eager to dig into it. And get this review ready for ISBA 2024. Given this prior knowledge, completed with sequential observations, I thus fear my review will be far from objective! And most likely more critical than it should be as fantasying how I would have written a book on that topic…

“Some of the best statisticians (not named Fisher or Neyman)…” (p1)

The book covers traditional approaches to principled ways of selecting prior distributions, culminating with the reference prior introduced by José Bernardo in his PhD thesis in the late 1970’s and expanded by all three authors over their academic careers. (Why is the acute accent missing from José on the front pages?!) The cover connects to the three founding fathers of objective Bayesian inference, Bayes, Laplace, and Jeffreys. The contents are not overly surprising from a personal viewpoint, i.e. as a card-carrying O’Bayes member. Namely that the chapters set the scene of parametric models and Bayesian inference (“a data driven probability transformation machine”), mostly supported by decision theory (including intrinsic losses!) but skipping testing and (mostly) model choice. This is unsurprisingly in the same spirit as Berger (1985) and Bernardo & Smith (1992). Not covering advanced Bayesian asymptotics, any flavour of Bayesian nonparametrics, the more recent generalized Bayesian inference, and the impact of misspecified models. The likelihood section does not mention Deborah Mayo’s criticism of the Likelihood Principle, or the Pitman Koopman lemma (although the examples are predominantly connected with exponential families).  The section (1.8) on MCMC implies that the Metropolis algorithm is less accurate that the Gibbs sampler, which is an exaggerated generalisation from a simple example, accrued by a comparison that does not seem to account for mixing behaviours.

“Our own belief is that the effort [seeking objective prior distributions] is a misguided search for the holy grail” (p.68)

The basics of objective priors repeats the useful warning that a truncation of parameter space is far from advised, as is the call for vague proper priors à la BUGS (a “nonsense”). A remark on the alternative weakly informative priors à la BDA require subjective input, a whole section on the legitimacy of improper priors as KL limits of sequences of proper priors. Plus a nice recall of the data dependent prior of Wasserman (2000) forcing mixtures to avoid empty clusters. This was the prior Jean Diebolt and I implemented in our 1990 Gibbs sampling paper. (I do not really see it as data dependent to impose that no component comes empty in the sample, but rather as a different model removing some terms from the likelihood.) There is even a chapter dedicated to constant priors—the historical meaning of inverse probability—, with a section on the modern advocates of this constant prior, that mostly focus on Binomial model. The book goes on justifying this prior by an invariance under reparameterisation argument (p108), but the discussion may seem stretched for some newcomers. This is followed by a nice chapter on frequentist matching, covering the bivariate Normal case and some asymptotics, followed by confidence distributions, quite topical and then fiducial inference, that imagines a posterior without a prior, one short too short chapter on invariance priors arguing for the right-Haar vs the left-Haar prior measure in invariance settings as exact matching, completed by a useful if short chapter—I would not have thought of including—on the performances of objective priors, like over-dispersion or under-dispersion. A mention is made there of the (now well-known) danger of using MCMC with improper posteriors as the issue potentially goes undetected, as it did in the early 1990s. Within its coherence section, the authors recall the fundamentals of the lovely marginalisation paradoxes. While addressing some computational issues, the book does not mention the derivation of the Jeffreys prior for mixtures Clara Grazian and I examined. The chapter ends by a rather expedited dismissal of maximum entropy. Which many still regard as the default approach to (partly informed) objective prior modelling. (They’ll be back in Chapter 13.)

The last hundred pages (Chap. 9-14) of the book are focussing on reference priors, as should be given the priorities of the authors. Starting with the rather convincing concept of maximising missing information, getting asymptotic to remove the impact of the data, and turning recursive in case of multidimensional parameters, while resorting to compact parameter spaces to avoid improprieties (the Achille’s heel of reference priors!). Reaching a definition (p176) in the univariate case that coherently does not depend on the sample size but on an arbitrary dominating measure (p179), interestingly sharing this feature with the definition of conjugate priors. In multivariate settings, things get… more complicated! And force a separation between nuisance and interesting parameters lest the resulting priors prove underperforming. Asymptotic normality again helps, but the derivation remains involved witness a one page (p201) theorem (Proposition 10.2).  My favourite example of selecting the prior for a Normal mean squared norm is there, with the original Jeffreys prior based on the Normal vector failing badly while the reference prior based on the norm of the observation does much better! (An open problem is the construction of the Jeffreys prior in that example.) A large table (p210) illustrates the plethora of reference priors depending on the parameter ordering. A short chapter (11) specialises on discrete parameters as in population sizes. And in model choice, a resolution I had not seen previously, with prior weights depending on the number of parameters in the respective models. But not accounting for embedded models. Chapter 12 addresses the “overall objective” prior construction when all parameters are equal (and none more equal than others). Supporting in the end the best overall prior defined in terms of distance to a family of reference priors. With a special treatment of the hierarchical Normal model following Berger & al. (2020). Chapter 13 is a short incursion into partial information reference priors, incl. maxent priors. Chapter 14 is about special reference priors exploiting special structures. And, at last, Chapter 15 a non-chapter pointing out to a catalogue of objective priors, following Yang & Berger (1997) as well as an initiative set during one of the O’Bayes meetings.

On the minor (nitpicking) side, I found a few “the the” (the typo no one can escape!) throughout the book, informality in some statements like Proposition 1.6, whose limit (in n) depends on n (a shortcut from which we try to wean our students). Also a somewhat anecdotal appearance of the ratio of uniforms algorithm with a mistaken statement that the method doesn’t depend on a proposal (p61), the references to Jeffreys’ main book clashing between the 1930s  and 1961 (final edition). The “random posterior” section 1.8 6 seems unfinished.

In conclusion, this much awaited reference book does deliver! It brings a perspective on reference priors that no other book does and reflects (well) on the authors’ careful completion of a coherent theory, hence should appeal to anyone working on the foundations and principles of Bayesian inference. Obviously, it will not change the position of strict subjectivists, nor convince non-Bayesians, but it should inspire current and future researchers, as well as complement graduate courses on Bayesian inference. In addition, the huge bibliography retraces the work in the area till today (if less intensely in the most recent years). Kudos to the authors, then!

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE!]

integral priors for multiple comparison

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on June 24, 2024 by xi'an

Diego Salmerón and I just arXived a paper on integral priors for multiple model comparison, about deriving reference priors for multiple hypothesis testing. As (so-called) noninformative priors constructed for estimation purposes are usually not appropriate for model selection and testing due to their improperness, Jeffreys-Lindley paradoxes and the like, the methodology of integral priors was developed to get prior distributions for Bayesian model selection when comparing two models, modifying initial improper reference priors. This paper proposes a generalization of this methodology when than two models are to be compared. In order to avoid the above paradoxes and the associated possibility of producing a null recurrent or transient Markov chain, our approach adds an artificial copy of each model under comparison by compactifying the corresponding parametric space and creates an ergodic Markov chain exploring all models that returns the integral priors as marginals of the ergodic and stationary joint distribution. Besides the guarantee of existence of these integral priors and the disappearance of paradoxes that plague estimation reference priors, an additional perk of this methodology is that the simulation of this Markov chain is straightforward as it only requires simulations of imaginary training samples and from the corresponding posterior distributions, for all models, while producing Bayes factor approximations on the side. This renders its implementation automatic and generic, both in the nested and in the nonnested cases. We associated our late friend Juan Antonio Cano to this paper as he was instrumental in initiating both this collaboration and the methodology at its core.

it’s objectively out!

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on June 13, 2024 by xi'an