Archive for noninformative priors

on(-line) integral priors for model selection

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on February 27, 2026 by xi'an

integral priors for model comparison [2.0]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on May 2, 2025 by xi'an

integral priors for multiple comparison

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on June 24, 2024 by xi'an

Diego Salmerón and I just arXived a paper on integral priors for multiple model comparison, about deriving reference priors for multiple hypothesis testing. As (so-called) noninformative priors constructed for estimation purposes are usually not appropriate for model selection and testing due to their improperness, Jeffreys-Lindley paradoxes and the like, the methodology of integral priors was developed to get prior distributions for Bayesian model selection when comparing two models, modifying initial improper reference priors. This paper proposes a generalization of this methodology when than two models are to be compared. In order to avoid the above paradoxes and the associated possibility of producing a null recurrent or transient Markov chain, our approach adds an artificial copy of each model under comparison by compactifying the corresponding parametric space and creates an ergodic Markov chain exploring all models that returns the integral priors as marginals of the ergodic and stationary joint distribution. Besides the guarantee of existence of these integral priors and the disappearance of paradoxes that plague estimation reference priors, an additional perk of this methodology is that the simulation of this Markov chain is straightforward as it only requires simulations of imaginary training samples and from the corresponding posterior distributions, for all models, while producing Bayes factor approximations on the side. This renders its implementation automatic and generic, both in the nested and in the nonnested cases. We associated our late friend Juan Antonio Cano to this paper as he was instrumental in initiating both this collaboration and the methodology at its core.

a case for Bayesian deep learnin

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on September 30, 2020 by xi'an

Andrew Wilson wrote a piece about Bayesian deep learning last winter. Which I just read. It starts with the (posterior) predictive distribution being the core of Bayesian model evaluation or of model (epistemic) uncertainty.

“On the other hand, a flat prior may have a major effect on marginalization.”

Interesting sentence, as, from my viewpoint, using a flat prior is a no-no when running model evaluation since the marginal likelihood (or evidence) is no longer a probability density. (Check Lindley-Jeffreys’ paradox in this tribune.) The author then goes for an argument in favour of a Bayesian approach to deep neural networks for the reason that data cannot be informative on every parameter in the network, which should then be integrated out wrt a prior. He also draws a parallel between deep ensemble learning, where random initialisations produce different fits, with posterior distributions, although the equivalent to the prior distribution in an optimisation exercise is somewhat vague.

“…we do not need samples from a posterior, or even a faithful approximation to the posterior. We need to evaluate the posterior in places that will make the greatest contributions to the [posterior predictive].”

The paper also contains an interesting point distinguishing between priors over parameters and priors over functions, ony the later mattering for prediction. Which must be structured enough to compensate for the lack of data information about most aspects of the functions. The paper further discusses uninformative priors (over the parameters) in the O’Bayes sense as a default way to select priors. It is however unclear to me how this discussion accounts for the problems met in high dimensions by standard uninformative solutions. More aggressively penalising priors may be needed, as those found in high dimension variable selection. As in e.g. the 10⁷ dimensional space mentioned in the paper. Interesting read all in all!

how can a posterior be uniform?

Posted in Books, Statistics with tags , , , , , , on September 1, 2020 by xi'an

A bemusing question from X validated:

How can we have a posterior distribution that is a uniform distribution?

With the underlying message that a uniform distribution does not depend on the data, since it is uniform! While it is always possible to pick the parameterisation a posteriori so that the posterior is uniform, by simply using the inverse cdf transform, or to pick the prior a posteriori so that the prior cancels the likelihood function, there exist more authentic discrete examples of a data realisation leading to a uniform distribution, as eg in the Multinomial model. I deem the confusion to stem from the impression either that uniform means non-informative (what we could dub Laplace’s daemon!) or that it could remain uniform for all realisations of the sampled rv.