Archive for Markov chains

on(-line) integral priors for model selection

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , on February 27, 2026 by xi'an

Elo rating systems via Markov Chains

Posted in pictures, Running, Statistics, University life with tags , , , , , , , , , , , on December 18, 2025 by xi'an

In preparation for meeting with a national sport association towards a Bayesian approach to ranking (I can only confirm this not for volleyball!), I was searching for advanced studies of the Elo (not ELO!) rating system and came across this 2024 arXival by Sam Olesker-Taylor (U Warwick) and Luca Zanetti. Where they analyse the online behaviour of the player ratings and their convergence (in Wasserstein distance), with fairly intricate proofs.

“Elo is not a reversible Markov chain and, while it has a unique stationary distribution, assuming a minor and natural condition, it does not converge to it in total variation.”

The analysis is based on the Bradley–Terry–Luce model of the probability of player i winning a game against player j. Which amounts to a logit or sigmoïd transform of the difference between the players’ true ratings. The practical Elo updates amounts to one stochastic gradient step for the associated likelihood. The authors also propose a random allocation of players into pairs that achieves optimal convergence to the true rating, by maximising the spectral gap of the allocation matrix. Although I did not spot how to derive it in practice.

bias reduction in self-normalised importance sampling

Posted in Books, Statistics with tags , , , , , , on December 22, 2023 by xi'an

Gabriel Cardoso and coauthors (among whom Éric Moulines and Achille Thin, with whom I collaborated on the inFINE/NEO algorithm), have arXived a nice entry on a cheap way to reduce bias in the famously biased self-normalised importance sampling estimator. Which is a standard solution when the target density is not normalised. They reconsider a 2004 technical paper by Tjemeland—I remember reading at the time—, which constructs a sampling resampling algorithm by creating a Markov chain and choosing between the current value and a pool of M proposed values (from the importance function), according to the importance weight, which, thanks to Tjemeland’s reformulation with two copies of the current state, constitutes a Gibbs sampler with the correct target. As in Tjemeland (2004), they propose to recycle all proposed values into the integral estimate, which then turned being unbiased under stationarity, rather unexpectedly. The paper then proceeds to analyse convergence towards this expectation, linearly in the size of the pool and exponentially in the number of Markov iterations.

 

reheated vanilla Rao-Blackwellisation

Posted in Books, Kids, Statistics, University life with tags , , , , on December 18, 2023 by xi'an

Over the weekend, I came across a X validated question asking for clarification about our 2012 Vanilla Rao-Blackwellisation paper with Randal. Question written in a somewhat formal style that made our work difficult to recognise… At least for yours truly.

Interestingly this led another (major) contributor to X validation to work out an uncompleted illustration as attached, when the target distribution is (1-x)². It seems strange to me that the basics of the method proves such a difficulty to fathom, given that it is a simple integration of the (actual and virtual) uniforms…. The point of the OP that the improvement brought by Rao-Blackwellisation is only conditional on the accepted values is correct, though.

a second course in probability² [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on December 17, 2023 by xi'an

I was sent [by CUP] Ross & Peköz Second Course in Probablity for review. Although it was published in 2003, a second edition has come out this year. I had not looked at the earlier edition hence will not comment on the differences, but rather reflect on my linear reading of the book and my reactions as a potential teacher (even though I have not taught measure theory for decades, being a low priority candidate in an applied math department). As a general perspective, I think it would be deemed as too informal for our 3rd year students in Paris Dauphine.

This indeed is a soft introduction to measure based probability theory. With plenty of relatively basic examples as the requirement on the calculus background of the readers is quite limited. Surprising appearance of an integral in the expectation section before it is ever defined (meaning it is a Riemann integral as confirmed on the next page), but all integrals in the book will be Riemann integrals, with hardly a mention of a more general concept or even of Lebesgue integration (p 16). Which leads to the probability density being defined in terms of the Lebesgue measure (not yet mentioned). Expectation as suprema of step functions which is enough to derive the dominated convergence theorem. And a (insufficiently detailed?) proof that inverting the cdf at a uniform produces a generation from that distribution. Representation that proves most useful for the results of convergence in distribution. Although the choice (p 31) that all rv’s in a sequence are deterministic transforms of the same Uniform may prove challenging for the students (despite mentioning Skohorod’s representation theorem). Concluding the first chapter with an ergodic theorem for stationary and… ergodic sequences, possibly making the result sound circular. Annoyingly (?) a lot of examples involve discrete rvs, the more as we proceed through the chapters. (Hence the unimaginative dice cover.)

Chap 2, the definition of stochastically smaller is missing italics on the term. This chapter relies on the powerful notion of coupling, leading to Le Cam’s theorem and the Stein-Chen method. Declined for Poisson, Geometric, Normal, and Exponential variates, incl. a Central Limit Theorem. Surprising appearance of a conditional distribution and even more of a conditional variate (Theorem 2.11)  that I would criticize as sloppy were it to occur within an X validated question!

Chap 3 on martingales with another informal start on conditional expectations using some intuition from the easiest cases, but also a yet undefined notion of conditional distribution. The main application of the notion is the martingale stopping theorem, with mostly discrete illustrations. (The first sentence of the chapter is puzzling, presenting as a generalisation of iid-ness a sequence of rv’s as having each term depending on the previous ones when the joint distribution can always be decomposed this way by a towering argument.)

Chap 4 on probability bounds with a first technique using the importance sampling identity, which includes the Chernoff bound as a special case. While there are principles at work, I am always uncomfortable teaching about these inequalities, as it often relies on a clever trick.

Chap 5 on Markov chains (with Markov deserving of an historical note contrary to Stein or Le Cam, Borel or Cantelli which would have helped my student seeking their names!) but this is solely done on discrete state spaces, without a mention that irreducible transient Markov chains cannot occur on a finite state space. The chapter covers essentials in that context, including Gambler’s ruin, but I’d rather refer to Feller’s (1970) more general coverage and wonder why the authors stuck to the discrete case.

Chap 6 on renewal theory, albeit defined only for crossing renewal times. In the spirit of Meyn & Tweedie (1994), I find renewal times quite useful in establishing Central Limit theorems in non-iid sequences, but here it is only applied to the renewal process itself (with a typo in Proposition 6.7). The chapter however includes an example of forward exact sampling for a Markov chain satisfying a minorisation condition, as well as brief sections on queuing and Poisson processes.

Chap 7 on Brownian motion, no less! With a discrete iterative construction one hopes will conduct to a proper limit as its existence is not formally proven. And which I deem did not require five figures to explain how to randomly move the midpoint of a segment. This short and final chapter proceeds à marche forcée towards a Central Limit theorem for general stationary and ergodic random variables. A bit too much for a 180p book.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]