We just arXived a new paper on Bayesian privacy! We meaning Cameron Bell, Antoine Luciano, Timothy Johnston and myself, as members of my ERC OCEAN lab at PariSanté and Paris Dauphine. While sharing the same ground as my recent paper with James Bailie, Joshua Bon and Judith Rousseau, this one is definitely more mainstream Bayesian in that the entire decision process falls under the Bayesian hat, with the ultimate decision being the choice of the release mechanism by the data holder (or hoarder!). To rationalise this decision process, we break the framework as resulting from the actions of three actors, namely the data holder, Alice, the data scientist, Bob, and the eavesdropper. Eve. (As in my earlier posts on solving Le Monde’s math puzzles, we could have used pronouns from other cultures, but I feared this would have confused some of the readers. Incidentally, I found out that the earliest use of the first two pronouns was within the groundbreaking cryptography 1977 paper of Rivest, Shamir and Adleman, bringing the RSA algorithm to the World! With Eve appearing in an early, highly-cited privacy paper by Montréal’s Bennett, Brassard, and (unconnected to me!) Robert, in 1988.)
We thus consider a Bayesian setting in which, given data x, held by Alice, inference is to be performed by Bob on a parameter θ. Performing such inference requires Alice releasing information derived from x, which may contain sensitive content, exploited by Eve. Our approach is to compare Alice’s release mechanisms according to both the quality of inference on θ (from Bob’s viewpoint) and the privacy leakage regarding x (sought by Eve and dreaded by Alice). To formalise this evaluation, we posit that Alice refers to a loss function that is a linear combination of Bob’s and Eve’s losses, the weight on Eve’s loss being then negative. (An alternative to be considered in future work is Alice using a ratio of Bob’s and Eve’s losses, possibly set to different powers, the rationale being that a zero loss for Eve is intolerable for Alice.) As in Bayesian experimental design, a prior on the data is necessary for Eve to infer on the hidden data based on the release mechanism and released output and for Alice to evaluate the risk of said release mechanism . (They may differ, as long as they are both made public.) To calibrate Alice’s loss, we opted for a balance that returns the same risk for a full data release and a total lack of release. In specific, informed, settings, other weights could be chosen. While finding the optimal release strategy is impossible but for highly discrete settings, the framework obviously allows for the ranking of natural strategies like insufficient statistics and synthetic datasets. Comments welcome!

I am quite excited about 


