This book by Paul Fearnhead, Christopher Nemeth, Chris Oates, and Chris Sherlock is part of the IMS Monograph series. And published by Cambridge University Press. It covers most recent developments in MCMC methods, namely stochastic gradient MCMC (Chap. 3), non-reversible MCMC (Chap. 4), continuous-time MCMC (Chap. 5), and assessing and improving MCMC (Chap. 6). I find the book remarkable in its attention to rigour and clarity, without falling into overly technical derivations. It is perfectly suited for a graduate course to students with a solid mathematical background. In short, had I considered a new edition of our Monte Carlo Statistical Methods book to incorporate these advances, I could not done such a good job!
The first chapter provides a quick refresher of the background, from Monte Carlo principles, to Markov chains, SDEs, and the kernel “trick” (which requires a dozen pages of exposition). Nonetheless, it contains side remarks of true interest, including some suggestions I had not previously seen, as for instance an unusual introduction of the HMC algorithm as an underdamped Langevin diffusion. Chapter 2 prolongates this recap by covering reversible MCMC algorithms and the attached optimal scalings. This is done in a particularly friendly presentation that I intend to use in my own course. The HMC section is probably the best coverage I have seen on the topic, including most naturally the leapfrog steps.
Chapter 3 gets into stochastic gradient MCMC as an approximate MCMC, with nice arguments and formal convergence bounds. Again quite efficiently, if focussing almost solely on Gaussian settings (but including a neural network example). Similarly, Chapter 4 provides intuitive (if informal) arguments on the worth of non-reversible algorithms that are well-suited to a textbook of this level. This chapter introduces a PDMP sampler like the discrete bouncy particle sampler.
Chapter 5 is a (nicely) monstrous coverage of continuous time MCMC samplers that reaches very recent advances on PDMPs. The focus is on expressing them as limits, in order to derive mixing rates without extreme mathematical steps. (The chapter even includes a mention to the coordinate sampler that my PhD student Wu Changye derived in 2018!) Again a chapter I plan to use when teaching MCM methods, if possibly skipping some of the 66 pages.
Chapter 6 completes the monograph with a presentation of convergence assessment tools and diagnostics, exploiting the kernel trick, as well as convergence bounds that reflect very recent research in that domain. The conclusive section on optimal weights and optimal thinning will presumably be new to most readers. (Making me wonder if a link can be found with our importance Markov chain construct.)
[Disclaimer about potential self-plagiarism as usual: this post or an edited version will eventually appear in my Books Review section in CHANCE.]
Privacy-preserving Computing for Big Data Analytics and AI
