Archive for Cambridge University Press

scalable Monte Carlo for Bayesian learning [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on September 26, 2025 by xi'an

This book by Paul Fearnhead, Christopher Nemeth, Chris Oates, and Chris Sherlock is part of the IMS Monograph series. And published by Cambridge University Press. It covers most recent developments in MCMC methods, namely stochastic gradient MCMC (Chap. 3), non-reversible MCMC (Chap. 4), continuous-time MCMC (Chap. 5), and assessing and improving MCMC (Chap. 6). I find the book remarkable in its attention to rigour and clarity, without falling into overly technical derivations. It is perfectly suited for a graduate course to students with a solid mathematical background. In short, had I considered a new edition of our Monte Carlo Statistical Methods book to incorporate these advances, I could not done such a good job!

The first chapter provides a quick refresher of the background, from Monte Carlo principles, to Markov chains, SDEs, and the kernel “trick” (which requires a dozen pages of exposition). Nonetheless, it contains side remarks of true interest, including some suggestions I had not previously seen, as for instance an unusual introduction of the HMC algorithm as an underdamped Langevin diffusion. Chapter 2 prolongates this recap by covering reversible MCMC algorithms and the attached optimal scalings. This is done in a particularly friendly presentation that I intend to use in my own course. The HMC section is probably the best coverage I have seen on the topic, including most naturally the leapfrog steps.

Chapter 3 gets into stochastic gradient MCMC as an approximate MCMC, with nice arguments and formal convergence bounds. Again quite efficiently, if focussing almost solely on Gaussian settings (but including a neural network example). Similarly, Chapter 4 provides intuitive (if informal) arguments on the worth of non-reversible algorithms that are well-suited to a textbook of this level. This chapter introduces a PDMP sampler like the discrete bouncy particle sampler.

Chapter 5 is a (nicely) monstrous coverage of continuous time MCMC samplers that reaches very recent advances on PDMPs. The focus is on expressing them as limits, in order to derive mixing rates without extreme mathematical steps. (The chapter even includes a mention to the coordinate sampler that my PhD student Wu Changye derived in 2018!) Again a chapter I plan to use when teaching MCM methods, if possibly skipping some of the 66 pages.

Chapter 6 completes the monograph with a presentation of convergence assessment tools and diagnostics, exploiting the kernel trick, as well as convergence bounds that reflect very recent research in that domain. The conclusive section on optimal weights and optimal thinning will presumably be new to most readers. (Making me wonder if a link can be found with our importance Markov chain construct.)

[Disclaimer about potential self-plagiarism as usual: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Scalable Monte Carlo for Bayesian Learning [not yet a book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on May 11, 2025 by xi'an

Privacy-preserving Computing [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , on May 13, 2024 by xi'an

Privacy-preserving Computing for Big Data Analytics and AI, by Kai Chen and Qiang Yang, is a rather short 2024 CUP book translated from the 2022 Chinese version (by the authors).  It covers secret sharing, homomorphic encryption, oblivious transfer, garbled circuit, differential privacy, trusted execution environment, federated learning, privacy-preserving computing platforms, and case studies. The style is survey-like, meaning it often is too light for my liking, with too many lists of versions and extensions, and more importantly lacking in detail to rely (solely) on it for a course. At several times standing closer to a Wikipedia level introduction to a topic. For instance, the chapter on homomorphic encryption [Chap.5] does not connect with the (presumably narrow) picture I have of this method. And the chapter on differential privacy [Chap.6] does not get much further than Laplace and Gaussian randomization, as in eg the stochastic gradient perturbation of Abadi et al. (2016) the privacy requirement is hardly discussed. The chapter on federated leaning [Chap.8] is longer if not much more detailed, being based on a entire book on Federated learning whose Qiang Yang is the primary author. (With all figures in that chapter being reproduced from said book.)  The next chapter [Chap.9] describes to some extent several computing platforms that can be used for privacy purposes, such as FATE, CryptDB, MesaTEE, Conclave, and PrivPy, while the final one goes through case studies from different areas, but without enough depth to be truly formative for neophyte readers and students. Overall, too light for my liking.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

the privacy fallacy [book review]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on May 3, 2024 by xi'an

“The World changed significantly since 1973.” (p.10)

I read this book, The Privacy Fallacy: Harm and Power in the Information Economy, by Ignacio Cofone, upon my return from Warwick the past week. This is a Cambridge University Press 2023 book I had picked from their publication list after reviewing a book proposal for them. A selection made with our ERC OCEAN goals in mind, but without paying enough attention to the book table of contents, since it proved to be a Law book!

“People’s inability to assess privacy risks impact people’s behavior toward privacy because it turns the risks into uncertainty, a kind of risk that is impossible to estimate.” (p.31)

Still, this ended up being a fairly interesting read (for me) about the shortcomings of the current legal privacy laws (in various countries), since they are based on an obsolete perception that predates AIs and social media. Its main theme is that privacy is a social value that must be protected, regardless of whether or not its breach has tangible consequences. The author then argues that notions that support these laws such as the rationality of individual choices, the confusion between privacy and secrecy, the binary dichotomy between public and private, &tc., all are erroneous, hence the “fallacy” he denounces. One immediate argument for his position is the extreme imbalance of information between individuals and corporations, the former being unable to assess the whole impact of clicking on “I agree” when visiting a webpage or installing a new app. The more because the data thus gathered is pipelined to third parties. (“One’s efforts cannot scale to the number of corporations collecting and using one’s personal data”, p.93) For similar reasons, Cofone further states that the current principles based on contracts are inappropriate. Also because data harm can be collective and because companies have a strong incentive to data exploitation, hence a moral hazard.

“Inferences, relational data, and de-identified data aren’t captured by consent provisions.” (p.9)

“AI inferences worsen information overload (…) As [they] continue to grow, so will the insufficiency of our processing ability to estimate our losses.” (p.75)

As illustrated by the surrounding quotes, the statistical and machine-learning aspects of the book are few and vague, in that the additional level of privacy loss due to post-data processing is considered as a further argument for said loss to be impossible to quantify and assess, without a proper evaluation of the channels through which this can happen and without a reglementary proposal towards its control. This level of discourse makes AIs appear as omniscient methods, unfortunately.

“Inferences are invisible (…) Risks posed by inferences are impossible to anticipate because the information inferred is disproportionate to the sum of the information disclosed.” (p.49)

“The idea of probabilistic privacy loss is crucial in a world where entities (..) mostly affect our privacy by making inferences” (p.121)

The attempts at regulation such as opt-in and informed consent are then denounced as illusions—obviously so imho, even without considering the nuisance of having to click on “Reject” for each newly visited website!—. De- and re-identified data does not require anyone’s consent. Data protection rights, as of today, do not provide protection in most cases, the burden of proof residing on the privacy victims rather than the perpetrators. The book unsurprisingly offers no technical suggestion towards ensuring corporations and data brokers comply with this respect of privacy and on the opposite agrees that institutional attempts such as GDPR remain well-intended wishful thinking w/o imposing a hard-wired way of controlling the data flows, with the “need of an enforcement authority with investigating and sanctioning powers” (p.106) . The only in-depth proposal therein is pushing for stronger accountability of these corporations via a new type of liability, with a prospect of class actions (if only in countries with this judiciary possibility).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Active Statistics: Stories, Games, Problems, and Hands-on Demonstrations [it’s out now!]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on March 13, 2024 by xi'an