Archive for Bayesian decision theory

Bayesian, adversarial, oceanic, privacy

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , on March 6, 2026 by xi'an

We just arXived a new paper on Bayesian privacy! We meaning Cameron Bell, Antoine Luciano, Timothy Johnston and myself, as members of my ERC OCEAN lab at PariSanté and Paris Dauphine. While sharing the same ground as my recent paper with James Bailie, Joshua Bon and Judith Rousseau, this one is definitely more mainstream Bayesian in that the entire decision process falls under the Bayesian hat, with the ultimate decision being the choice of the release mechanism by the data holder (or hoarder!). To rationalise this decision process, we break the framework as resulting from the actions of three actors, namely the data holder, Alice, the data scientist, Bob, and the eavesdropper. Eve. (As in my earlier posts on solving Le Monde’s math puzzles, we could have used pronouns from other cultures, but I feared this would have confused some of the readers. Incidentally, I found out that the earliest use of the first two pronouns was within the groundbreaking cryptography 1977 paper of Rivest, Shamir and Adleman, bringing the RSA algorithm to the World! With Eve appearing in an early, highly-cited privacy paper by Montréal’s Bennett, Brassard, and (unconnected to me!) Robert, in 1988.)

We thus consider a Bayesian setting in which, given data x, held by Alice, inference is to be performed by Bob on a parameter θ. Performing such inference requires Alice releasing information derived from x, which may contain sensitive content, exploited by Eve. Our approach is to compare Alice’s release mechanisms according to both the quality of inference on θ (from Bob’s viewpoint) and the privacy leakage regarding x (sought by Eve and dreaded by Alice). To formalise this evaluation, we posit that Alice refers to a loss function that is a linear combination of Bob’s and Eve’s losses, the weight on Eve’s loss being then negative. (An alternative to be considered in future work is Alice using a ratio of Bob’s and Eve’s losses, possibly set to different powers, the rationale being that a zero loss for Eve is intolerable for Alice.) As in Bayesian experimental design, a prior on the data is necessary for Eve to infer on the hidden data based on the release mechanism and released output and for Alice to evaluate the risk of said release mechanism . (They may differ, as long as they are both made public.) To calibrate Alice’s loss, we opted for a balance that returns the same risk for a full data release and a total lack of release. In specific, informed, settings, other weights could be chosen. While finding the optimal release strategy is impossible but for highly discrete settings, the framework obviously allows for the ranking of natural strategies like insufficient statistics and synthetic datasets. Comments welcome!

persuasive (and Oceanic) privacy

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , on February 3, 2026 by xi'an

I am quite excited about the paper James Baillie, Joshua Bon, Judith Rousseau, and myself just arXived! A novel framework for measuring privacy we have been working on for at least the past year, partly through the previous Les Houches privacy workshops. In the spirit of these workshops and the larger scale ERC Synergy grant OCEAN, we develop therein a rather generic Bayesian game-theoretic perspective on achieving statistical privacy. It involves a Sender (observing the original data and delivering a limited output) and a Receiver (with potential adversarial intentions). The paper mostly focus on setting a theoretical framework, including the creation of new, purpose-driven privacy definitions that are rigorously justified, while also allowing for the assessment of existing privacy guarantees through game theory. While this was not our original intent, we show that pure and probabilistic differential privacy notions, in the Dwork et al. (2006) sense, are special cases of our framework. This setting provides new interpretations of the post-processing inequality. Furthermore, and somewhat more importantly, we also prove that our privacy guarantees can be established for deterministic algorithms, which are outside current privacy standards. Hopefully, we’ll make further progress at the incoming privacy workshop next month, to be held in Venice (again).

Bayesian decision-theory for data privacy [surfin’ the Oce’n, 30 April, INRIA Paris]

Posted in Statistics, University life with tags , , , , , , , , , , , , , , , , , on April 23, 2025 by xi'an

Abstract

The scientific and economic value of data continues to grow alongside technology advances. New hardware and software developments enable, but often require, larger and more complex datasets to function effectively. As the importance of input data to these systems becomes increasingly recognized, so too does the loss of privacy for data providers. In this context, data privacy emerges as a critical issue for fields such as statistics and machine learning, as well as for scientific and industrial endeavours that rely on sensitive data. We propose a framework for measuring privacy from a Bayesian decision-theoretic perspective. This framework enables the creation of new, purpose-driven privacy principles that are rigorously justified, while also allowing for the assessment of existing privacy definitions through decision theory. We pay particular attention to the privacy of deterministic algorithms, which are overlooked by current privacy standards, and to the privacy of N Monte Carlo samples drawn from an invariant distribution as N goes to infinity. We show that Probabilistic Differential Privacy is a special case of our framework and provide some new interpretations for Differential Privacy as a result.

off to Tokyo

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , on March 6, 2025 by xi'an

Bayesian Inference: Theory, Methods, Computations [book review]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , , , , on November 12, 2024 by xi'an

Bayesian Inference: Theory, Methods, Computations by Silvelyn Zwanzig and Rauf Ahmad, both from Uppsala University, is a recent book published by Chapman & Hall / CRC Press. About 300p long (plus appendices), it covers the core aspects of Bayesian inference, namely the decision theoretic motivations, its asymptotic validation, the specifics of estimation and testing, and the computational approximations (MC, MCMC, ABC, VB), with entries on prior specification and Normal linear models. And some R codes. It is (and feels like) constructed from Master and PhD courses (at Uppsala University), with a rigorous mathematical presentation and many examples, some related to biostatistics. Drawings from the first author’s daughter are included in most chapters, to this reviewer’s bemusement. From a further personal viewpoint, the book also reads rather close to my (Bayesian) choice of a Bayesian textbook, which proves rather accurate since several chapters are inspired by my own Bayesian Choice. as acknowledged therein. As well as by the more recent Statistical Decision Theory: Estimation, Testing, and Selection by Liese & Miescke (2008) and Introduction to the Theory of Statistical Inference by Liero & Zwanzig (2011). Witness, for instance, an example of prior construction for capture-recapture experiments on lizards as analysed by my PhD student Dupuis (1995) [with a curious switch to the authors on p.263] and  also included in The Bayesian Choice (with drawing 2.9 incorrect in that the lizards there have marks on their backs, instead of the code adopted by the ecologists, namely cutting one specific phalange for each capture).

Other minor quandaries: The usual issue of quoting the wrong edition for creating a method, as when citing Jeffreys (1946) for inventing non-informative priors [p.53], failing to point out the parameterisation invariance of intrinsic losses [p.95]considering that Bayes factors are only relevant for obtaining evidence against the null hypothesis [p.216], recommending BIC and DIC (!) [pp.232-6], advocating sampling importance resampling (SIR) for approximate sampling from the target (omitting infinite variance issues) [p.253], defining annealing as using “several trial distributions” [p.261], a mistake in ABC-MCMC [p.274] since the case when the simulated data is too far from the actual data should lead to a repetition rather than a pure rejection.

All in all, a reasonable textbook with some recent input, but still lacking in originality, if I may subjectively say so.

[Disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear in my Books Review section in CHANCE.]