Archive for measure theory

a second course in probability² [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on December 17, 2023 by xi'an

I was sent [by CUP] Ross & Peköz Second Course in Probablity for review. Although it was published in 2003, a second edition has come out this year. I had not looked at the earlier edition hence will not comment on the differences, but rather reflect on my linear reading of the book and my reactions as a potential teacher (even though I have not taught measure theory for decades, being a low priority candidate in an applied math department). As a general perspective, I think it would be deemed as too informal for our 3rd year students in Paris Dauphine.

This indeed is a soft introduction to measure based probability theory. With plenty of relatively basic examples as the requirement on the calculus background of the readers is quite limited. Surprising appearance of an integral in the expectation section before it is ever defined (meaning it is a Riemann integral as confirmed on the next page), but all integrals in the book will be Riemann integrals, with hardly a mention of a more general concept or even of Lebesgue integration (p 16). Which leads to the probability density being defined in terms of the Lebesgue measure (not yet mentioned). Expectation as suprema of step functions which is enough to derive the dominated convergence theorem. And a (insufficiently detailed?) proof that inverting the cdf at a uniform produces a generation from that distribution. Representation that proves most useful for the results of convergence in distribution. Although the choice (p 31) that all rv’s in a sequence are deterministic transforms of the same Uniform may prove challenging for the students (despite mentioning Skohorod’s representation theorem). Concluding the first chapter with an ergodic theorem for stationary and… ergodic sequences, possibly making the result sound circular. Annoyingly (?) a lot of examples involve discrete rvs, the more as we proceed through the chapters. (Hence the unimaginative dice cover.)

Chap 2, the definition of stochastically smaller is missing italics on the term. This chapter relies on the powerful notion of coupling, leading to Le Cam’s theorem and the Stein-Chen method. Declined for Poisson, Geometric, Normal, and Exponential variates, incl. a Central Limit Theorem. Surprising appearance of a conditional distribution and even more of a conditional variate (Theorem 2.11)  that I would criticize as sloppy were it to occur within an X validated question!

Chap 3 on martingales with another informal start on conditional expectations using some intuition from the easiest cases, but also a yet undefined notion of conditional distribution. The main application of the notion is the martingale stopping theorem, with mostly discrete illustrations. (The first sentence of the chapter is puzzling, presenting as a generalisation of iid-ness a sequence of rv’s as having each term depending on the previous ones when the joint distribution can always be decomposed this way by a towering argument.)

Chap 4 on probability bounds with a first technique using the importance sampling identity, which includes the Chernoff bound as a special case. While there are principles at work, I am always uncomfortable teaching about these inequalities, as it often relies on a clever trick.

Chap 5 on Markov chains (with Markov deserving of an historical note contrary to Stein or Le Cam, Borel or Cantelli which would have helped my student seeking their names!) but this is solely done on discrete state spaces, without a mention that irreducible transient Markov chains cannot occur on a finite state space. The chapter covers essentials in that context, including Gambler’s ruin, but I’d rather refer to Feller’s (1970) more general coverage and wonder why the authors stuck to the discrete case.

Chap 6 on renewal theory, albeit defined only for crossing renewal times. In the spirit of Meyn & Tweedie (1994), I find renewal times quite useful in establishing Central Limit theorems in non-iid sequences, but here it is only applied to the renewal process itself (with a typo in Proposition 6.7). The chapter however includes an example of forward exact sampling for a Markov chain satisfying a minorisation condition, as well as brief sections on queuing and Poisson processes.

Chap 7 on Brownian motion, no less! With a discrete iterative construction one hopes will conduct to a proper limit as its existence is not formally proven. And which I deem did not require five figures to explain how to randomly move the midpoint of a segment. This short and final chapter proceeds à marche forcée towards a Central Limit theorem for general stationary and ergodic random variables. A bit too much for a 180p book.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Eeeech… [X validated]

Posted in Books, Statistics, University life with tags , , , , on November 3, 2023 by xi'an

Bertrand’s tartine

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , on November 25, 2022 by xi'an

A riddle from The Riddler on cutting a square (toast) into two parts and keeping at least 25% of the surface on each part while avoiding Bertrand’s paradox. By defining the random cut as generated by two uniform draws over the periphery of the square. Meaning that ¼ of the draws are on the same side, ½ on adjacent sides and again ¼ on opposite sides. Meaning one has to compute

P(UV>½)= ½(1-log(2))

and

P(½(U+V)∈(¼,¾))= ¾

Resulting in a probability of 0.2642 (checked by simulation)

conditioning on insufficient statistics in Bayesian regression

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on October 23, 2021 by xi'an

“…the prior distribution, the loss function, and the likelihood or sampling density (…) a healthy skepticism encourages us to question each of them”

A paper by John Lewis, Steven MacEachern, and Yoonkyung Lee has recently appeared in Bayesian Analysis. Starting with the great motivation of a misspecified model requiring the use of a (thus necessarily) insufficient statistic and moving to their central concern of simulating the posterior based on that statistic.

Model misspecification remains understudied from a B perspective and this paper is thus most welcome in addressing the issue. However, when reading through, one of my criticisms is in defining misspecification as equivalent to outliers in the sample. An outlier model is an easy case of misspecification, in the end, since the original model remains meaningful. (Why should there be “good” versus “bad” data) Furthermore, adding a non-parametric component for the unspecified part of the data would sound like a “more Bayesian” alternative. Unrelated, I also idly wondered at whether or not normalising flows could be used in this instance..

The problem in selecting a T (Darjeeling of course!) is not really discussed there, while each choice of a statistic T leads to a different signification to what misspecified means and suggests a comparison with Bayesian empirical likelihood.

“Acceptance rates of this [ABC] algorithm can be intolerably low”

Erm, this is not really the issue with ABC, is it?! Especially when the tolerance is induced by the simulations themselves.

When I reached the MCMC (Gibbs?) part of the paper, I first wondered at its relevance for the mispecification issues before realising it had become the focus of the paper. Now, simulating the observations conditional on a value of the summary statistic T is a true challenge. I remember for instance George Casella mentioning it in association with a Student’s t sample in the 1990’s and Kerrie and I having an unsuccessful attempt at it in the same period. Persi Diaconis has written several papers on the problem and I am thus surprised at the dearth of references here, like the rather recent Byrne and Girolami (2013), Florens and Simoni (2015), or Bornn et al. (2019). In the present case, the  linear model assumed as the true model has the exceptional feature that it leads to a feasible transform of an unconstrained simulation into a simulation with fixed statistics, with no measure theoretic worries if not free from considerable efforts to establish the operation is truly valid… And, while simulating (θ,y) makes perfect sense in an insufficient setting, the cost is then precisely the same as when running a vanilla ABC. Which brings us to the natural comparison with ABC. While taking ε=0 may sound as optimal for being “exact”, it is not from an ABC perspective since the convergence rate of the (summary) statistic should be roughly the one of the tolerance (Fearnhead and Liu, Frazier et al., 2018).

“[The Borel Paradox] shows that the concept of a conditional probability with regard to an isolated given hypothesis whose probability equals 0 is inadmissible.” A. Колмого́ров (1933)

As a side note for measure-theoretic purists, the derivation of the conditional of y given T(y)=T⁰ is arbitrary since the event has probability zero (ie, the conditioning set is of measure zero). See the Borel-Kolmogorov paradox. The computations in the paper are undoubtedly correct, but this is only one arbitrary choice of a transform (or conditioning σ-algebra).

conditioning an algorithm

Posted in Statistics with tags , , , , , , , , , , , on June 25, 2021 by xi'an

A question of interest on X validated: given a (possibly black-box) algorithm simulating from a joint distribution with density [wrt a continuous measure] p(z,y) (how) is it possible to simulate from the conditional p(y|z⁰)? Which reminded me of a recent paper by Lindqvist et al. on conditional Monte Carlo. Which zooms on the simulation of a sample X given the value of a sufficient statistic, T(X)=t, revolving about pivotal quantities and inversions à la fiducial statistics, following an earlier Biometrika paper by Lindqvist & Taraldsen, in 2005. The idea is to write

X=\chi(U,\theta)\qquad T(X)=\tau(U,\theta)

where U has a distribution that depends on θ, to solve τ(u,θ)=t in θ for a given pair (u,t) with solution θ(u,t) and to generate u conditional on this solution. But this requires getting “under the hood” of the algorithm to such an extent as not answering the original question, or being open to other solutions using the expression for the joint density p(z,y)… In a purely black box situation, ABC appears as the natural if approximate solution.