model uncertainty and missing data: an objective BAyesian perspective

My Spanish and objective Bayesian friends Gonzalo García-Donato, María Eugenia Castellanos, Stefano Cabras, Alicia Quirós, and Anabel Forte wrote an fairly exciting paper in BA that is open to discussion (for a few more days), to be discussed on 05 November (4:00 PM UTC | 11:00 AM EST | 5:00 PM CET).

The interplay between missing data and model uncertainty—two classic statistical problems—leads to primary questions that we formally address from an objective Bayesian perspective. For the general regression problem, we discuss the probabilistic justification of Rubin’s rules applied to the usual components of Bayesian variable selection, arguing that prior predictive marginals should be central to the pursued methodology. In the regression settings, we explore the conditions of prior distributions that make the missing data mechanism ignorable, provided that it is missing at random or completely at random. Moreover, when comparing multiple linear models, we provide a complete methodology for dealing with special cases, such as variable selection or uncertainty regarding model errors. In numerous simulation experiments, we demonstrate that our method outperforms or equals others, in consistently producing results close to those obtained using the full dataset. In general, the difference increases with the percentage of missing data and the correlation between the variables used for imputation.

The so-called Rubin’s identity is simply the representation of the posterior probability of a model γ given the observed data x⁰, p(γ|x⁰), as the integrated posterior probability of a model given both observed and latent data,  p(γ|x⁰, x¹), against the marginal of latent x¹ given observed x⁰. Since this marginal involves the probabilities p(γ|x⁰), this representation is not directly useful for a numerical implementation.

In this paper, missingness relates to some entries of either the covariates or the response variate. Which is less common but more realistic, especially if some covariates do not contribute to the response. (The missingness mechanism does not matter if the data is missing at random (à la Rubin). The computational solution (p9) is rather standard, simulating the missing variables given the observed variables. In my opinion, the elephant in the room is the super-delicate selection of a prior distribution on the missing covariates, as methinks this impacts in a considerable manner the actual value of the Bayes factor, hence the selection of the surviving model. (As a side remark, we are credited in Celeux et al. (2006) to have “extended DIC for missing data models or when missing data were present”, but our point was instead to point out the arbitrariness of the very definition of DIC in such contexts.)

“The standard Bayesian method for addressing the absence of prior information uses improper distributions. In estimation problems (the model is fixed), the impropriety of priors does not imply any additional difficulty as long as the posterior is proper” (p9)

The authors point out the well-known difficulty with improper priors but still resort to improper priors on the parameters shared by all models—which I dispute as being adequate, despite the arguments put forward on p15, right Haar measure or not—, while sticking to proper priors on the model-dependent parameters. Which unsurprisingly become Zellner’s g-priors. Or rather g’-priors, although the discussion seems to resolve into the (model-free) factor g’ being equal to 1 as for the g-priors. Again a strong term in the derivation of the Bayes factor.

One Response to “model uncertainty and missing data: an objective BAyesian perspective”

  1. There seems to be a typo in the date of the next BA webinar, which will be on:
    📅 November 5, 2025 (4:00 PM UTC | 11:00 AM EST | 5:00 PM CET)

    BTW, a curated list of all BA webinars is available here:
    https://bayesian.org/ba-webinars/

    See also the ISBA YouTube channel for even more videos!
    https://www.youtube.com/@isba-internationalsocietybayes

    Cheers,
    Julyan, on behalf of the ISBA Social Media team

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.