Archive for introductory textbooks

A modern introduction to probability and statistics [book review]

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on July 12, 2025 by xi'an

In the plane to Bengaluru, I read through the book A modern introduction to probability and statistics, by Graham Upton—whose Measuring Animal Abundance I reviewed for CHANCE a while ago—, which is based on the earlier Understanding Statistics, written jointly with Ian Cook. (Not to be confused with A modern introduction to probability and statistics by Dekking et al.) The subtitle is understanding statistical principles in the computer age. Sorry, in the age of the computer. While the cover is most pleasant (and modern), as noticed by an AF flight attendant, the contents are very very standard and could have been written decades ago since the main concession to “the” computer age is the inclusion of a few R commands at the end of most chapters. There are even a few distribution tables here and there (in case “the” computer is not available). But there is no other connection with computational statistics or statistical computing.

The classicism of the contents and the intended audience mean there is little therein on which to either object or criticise. The mixture of elementary probability and basic statistics in a single textbook always feels awkward to me and I think I would have trouble teaching solely from this material. Apart from the glaring typo on the variance of the sum of two correlated random variables on page 87, missing the factor 2 in front of the covariance, while correct(ed) p97 (and the inevitable “the the” typo spotted once). My main criticisms are on the potential confusion between samples and populations in the early chapters, when some statistics are used as motivational examples, as for instance in a (hidden) Monte Carlo stabilisation to the limiting values (p57), way before the Law of Large Numbers is introduced,, the variable mileage in mathematical rigour (while being uncertain that first year students can handle integrals and derivatives), the textbook examples, and the amount of the book contents spent on descriptive statistics and even more on the “classical” tests, with no critical perspective on using point nulls or p-values. The book concludes with a four page (benevolent) chapter on Bayesian statistics that is superfluous imho, or even counterproductive since my experience with a rushed introduction to Bayesian principles almost always result in a rejection of said principles. Plus, the illustration with the coin tossing is not particularly helpful since Andrew maintains that one can load a die, but cannot bias a coin. (A similar reservation on the half-page 289 coverage on pseudo-random generation and Monte Carlo principles for computing p-values.)

Minor (mostly idiosyncratic) remarks follow: CLT prior to LLN,   n-1 in sample sd, little to no model criticism (ntbcf goodness of fit), missing an opportunity when mentioning the varying probability of a day being a birthday (p31) in contrast with BDA cover story, and another opportunity to cite the 2024 Ig Nobel Prize for coin tossing around the LLN, an unclear definition for random variables( p53) and a potentially confusing introduction of Poisson distributions through a informal reference to Poisson processes (and no reason why the years of accession of the kings of Sussex and England till Guillaume—making a return on p178 with the Domesday Book—in 1066 should follow such a process as suggested in Figure 3.5), a surprising definition of the constant e as the special case of exp(x) when x=1 and its series expansion (p70), omitting proofs on laws of sums of iid rv’s by introducing moment generating functions rather late, another obscure reference to a 16th German treatise on surveying as a precursor of the CLT (p131), a proof for the normalising constant of the Normal density that will most likely escape most first year students, a introduction of the t, F, and χ² distributions with no mention of their respective densities (pp141-147), never defining a joint Normal distribution density, insisting on unbiasedness without noting that maximum likelihood—with a strange motivation that it “makes the next sample of n observations most likely to resemble the data in the current sample (p228)—estimators are almost always biased, an abundance of footnotes that may prove of little interest for the youngest readers.

[Disclaimer about potential self-plagiarism as usual: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Active Statistics: Stories, Games, Problems, and Hands-on Demonstrations [it’s out now!]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on March 13, 2024 by xi'an

a second course in probability² [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on December 17, 2023 by xi'an

I was sent [by CUP] Ross & Peköz Second Course in Probablity for review. Although it was published in 2003, a second edition has come out this year. I had not looked at the earlier edition hence will not comment on the differences, but rather reflect on my linear reading of the book and my reactions as a potential teacher (even though I have not taught measure theory for decades, being a low priority candidate in an applied math department). As a general perspective, I think it would be deemed as too informal for our 3rd year students in Paris Dauphine.

This indeed is a soft introduction to measure based probability theory. With plenty of relatively basic examples as the requirement on the calculus background of the readers is quite limited. Surprising appearance of an integral in the expectation section before it is ever defined (meaning it is a Riemann integral as confirmed on the next page), but all integrals in the book will be Riemann integrals, with hardly a mention of a more general concept or even of Lebesgue integration (p 16). Which leads to the probability density being defined in terms of the Lebesgue measure (not yet mentioned). Expectation as suprema of step functions which is enough to derive the dominated convergence theorem. And a (insufficiently detailed?) proof that inverting the cdf at a uniform produces a generation from that distribution. Representation that proves most useful for the results of convergence in distribution. Although the choice (p 31) that all rv’s in a sequence are deterministic transforms of the same Uniform may prove challenging for the students (despite mentioning Skohorod’s representation theorem). Concluding the first chapter with an ergodic theorem for stationary and… ergodic sequences, possibly making the result sound circular. Annoyingly (?) a lot of examples involve discrete rvs, the more as we proceed through the chapters. (Hence the unimaginative dice cover.)

Chap 2, the definition of stochastically smaller is missing italics on the term. This chapter relies on the powerful notion of coupling, leading to Le Cam’s theorem and the Stein-Chen method. Declined for Poisson, Geometric, Normal, and Exponential variates, incl. a Central Limit Theorem. Surprising appearance of a conditional distribution and even more of a conditional variate (Theorem 2.11)  that I would criticize as sloppy were it to occur within an X validated question!

Chap 3 on martingales with another informal start on conditional expectations using some intuition from the easiest cases, but also a yet undefined notion of conditional distribution. The main application of the notion is the martingale stopping theorem, with mostly discrete illustrations. (The first sentence of the chapter is puzzling, presenting as a generalisation of iid-ness a sequence of rv’s as having each term depending on the previous ones when the joint distribution can always be decomposed this way by a towering argument.)

Chap 4 on probability bounds with a first technique using the importance sampling identity, which includes the Chernoff bound as a special case. While there are principles at work, I am always uncomfortable teaching about these inequalities, as it often relies on a clever trick.

Chap 5 on Markov chains (with Markov deserving of an historical note contrary to Stein or Le Cam, Borel or Cantelli which would have helped my student seeking their names!) but this is solely done on discrete state spaces, without a mention that irreducible transient Markov chains cannot occur on a finite state space. The chapter covers essentials in that context, including Gambler’s ruin, but I’d rather refer to Feller’s (1970) more general coverage and wonder why the authors stuck to the discrete case.

Chap 6 on renewal theory, albeit defined only for crossing renewal times. In the spirit of Meyn & Tweedie (1994), I find renewal times quite useful in establishing Central Limit theorems in non-iid sequences, but here it is only applied to the renewal process itself (with a typo in Proposition 6.7). The chapter however includes an example of forward exact sampling for a Markov chain satisfying a minorisation condition, as well as brief sections on queuing and Poisson processes.

Chap 7 on Brownian motion, no less! With a discrete iterative construction one hopes will conduct to a proper limit as its existence is not formally proven. And which I deem did not require five figures to explain how to randomly move the midpoint of a segment. This short and final chapter proceeds à marche forcée towards a Central Limit theorem for general stationary and ergodic random variables. A bit too much for a 180p book.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

probably overthinking it [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on December 13, 2023 by xi'an

Probably overthinking it, written by Allen B. Downey (who wrote a series of books starting with Think, like Think Python, Think Bayes, Think Stats), belongs to this numerous collection of introductory books that aim at making statistics more palatable and enticing to the general public by making the fundamental concepts more intuitive and building upon real life examples. I would thus stop short of calling it “essential guide” as in the first flap of the dust jacket, since there exist many published books with a similar goal, some of which were actually reviews here. Now, there are ideas and examples therein I could borrow for my introductory stats course, except that I will cease teaching it next year! For instance, there are lots of examples related to COVID, which is great to engage (enrage?) the readers.

The book is quite pleasant to read, does not shy from mathematical formulae, and covers notions such as probability distributions, the Simpson, the Preston, the inspection, the Berkson paradoxes, and even some words on causality, sometimes at excessive lengths. (I have always been an adept of the concise church when it comes to textbook examples and fear that the multiplication of illustrations of a given concept may prove counterproductive.) The early chapters are heavily focussed on the Gaussian (or Normal) distribution. Making it appear as essential for conducting statistical analysis. When it does not, as in the ELO example, the explanations of a correction are less convincing.

I appreciated the book approach to model fit via the comparison of empirical cdfs with hypothetical ones. Also of primary interest is the systematic recourse to simulation, aka generative models, albeit without a systematic proper description. In the chapter (Chap 5) about durations, I think there are missed opportunities like the distributions of extremes (p 82) or the forgetfulness property of the Exponential distribution. Instead the focus is slightly diverging towards non-statistical issues on demography by the end of the chapter, with a potential for confusion between the Gomperz law and the Gomperz distribution. The Berkson paradox (Chap 6) is well-explained in terms of non-random populations (and reminded me when, years ago, when we tried to predict the first year success probability of undergrad applicants from their high school maths grade, the regression coefficient estimate ended up negative). Distributions of extremes do appear in Chap 8, if again seeking an ideal generic distribution seems to me rather misguided and misguiding. I would also argue that the author is missing the point of Taleb’s black swans by arguing in favour of a better modelling, when the later argues against the very predictability of extreme events in a non-stationary financial world… The chapter on fairness and fallacy (Chap 9) is actually about false positive/negative rates in different populations hence the ensuing unfairness (or the base fallacy). In that chapter there is no mention of Bayes (reserved for Think Bayes?!), but it is hitting hard enough at anti-vaxers (who will most likely not read the book). And does it again in the Simpson paradox chapter (Chap 10), whose proliferation is further stressed the following chapter on people becoming less racist or sexist or homophobic when they age, despite the proportion of racist/sexist/homophobic responses to a specific survey (GSS/Pew) increasing with age. This is prolonged into the rather minor final chapter.

Now that I have read the book, during a balmy afternoon in St Kilda (after an early start in the train to De Gaulle airport in freezing temperatures), I am a bit uncertain at what to make of it in terms of impact on the general public. For sure, the stories that accumulate chapter after chapter are nice and well argued, while introducing useful statistical concepts, but I do not see readers equipped enough to handle daily statistics with more than an healthy dose of scepticism, which obviously is a first step in the right direction!

Some nitpicking : the book is missing the historical connection to Quetelet’s “average man” when referring to the notion. And a potential explanation for the (approximate) log-Gaussianity of weights of individuals in a population through the fact that it is a volume, hence a third power of a sort.  Although birth weights are roughly Normal which kill my argument. I remain puzzled by the title, possibly missing a cultural reference (as there are tee-shirts sold with this sentence). It is the same as the name of a blog run by the author since 2011 and a fodder for the book. And the cover is terrible, breaking the words to fit the width making no sense, if I am not overthinking it! As often the book is rather US centric, although making no mention of US having much higher infant death rates than countries with similar GDPs when this data is discussed.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

Bayes Rules! [book review]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 5, 2022 by xi'an

Bayes Rules! is a new introductory textbook on Applied Bayesian Model(l)ing, written by Alicia Johnson (Macalester College), Miles Ott (Johnson & Johnson), and Mine Dogucu (University of California Irvine). Textbook sent to me by CRC Press for review. It is available (free) online as a website and has a github site, as well as a bayesrule R package. (Which reminds me that both our own book R packages, bayess and mcsm, have gone obsolete on CRAN! And that I should find time to figure out the issue for an upgrading…)

As far as I can tell [from abroad and from only teaching students with a math background], Bayes Rules! seems to be catering to early (US) undergraduate students with very little exposure to mathematical statistics or probability, as it introduces basic probability notions like pmf, joint distribution, and Bayes’ theorem (as well as Greek letters!) and shies away from integration or algebra (a covariance matrix occurs on page 437 with a lot . For instance, the Normal-Normal conjugacy derivation is considered a “mouthful” (page 113). The exposition is somewhat stretched along the 500⁺ pages as a result, imho, which is presumably a feature shared with most textbooks at this level, and, accordingly, the exercises and quizzes are more about intuition and reproducing the contents of the chapter than technical. In fact, I did not spot there a mention of sufficiency, consistency, posterior concentration (almost made on page 113), improper priors, ergodicity, irreducibility, &tc., while other notions are not precisely defined, like ESS, weakly informative (page 234) or vague priors (page 77), prior information—which makes the negative answer to the quiz “All priors are informative”  (page 90) rather confusing—, R-hat, density plot, scaled likelihood, and more.

As an alternative to “technical derivations” Bayes Rules! centres on intuition and simulation (yay!) via its bayesrule R package. Itself relying on rstan. Learning from example (as R code is always provided), the book proceeds through conjugate priors, MCMC (Metropolis-Hasting) methods, regression models, and hierarchical regression models. Quite impressive given the limited prerequisites set by the authors. (I appreciated the representations of the prior-likelihood-posterior, especially in the sequential case.)

Regarding the “hot tip” (page 108) that the posterior mean always stands between the prior mean and the data mean, this should be made conditional on a conjugate setting and a mean parameterisation. Defining MCMC as a method that produces a sequence of realisations that are not from the target makes a point, except of course that there are settings where the realisations are from the target, for instance after a renewal event. Tuning MCMC should remain a partial mystery to readers after reading Chapter 7 as the Goldilocks principle is quite vague. Similarly, the derivation of the hyperparameters in a novel setting (not covered by the book) should prove a challenge, even though the readers are encouraged to “go forth and do some Bayes things” (page 509).

While Bayes factors are supported for some hypothesis testing (with no point null), model comparison follows more exploratory methods like X validation and expected log-predictive comparison.

The examples and exercises are diverse (if mostly US centric), modern (including cultural references that completely escape me), and often reflect on the authors’ societal concerns. In particular, their concern about a fair use of the inferred models is preminent, even though a quantitative assessment of the degree of fairness would require a much more advanced perspective than the book allows… (In that respect, Exercise 18.2 and the following ones are about book banning (in the US). Given the progressive tone of the book, and the recent ban of math textbooks in the US, I wonder if some conservative boards would consider banning it!) Concerning the Himalaya submitting running example (Chapters 18 & 19), where the probability to summit is conditional on the age of the climber and the use of additional oxygen, I am somewhat surprised that the altitude of the targeted peak is not included as a covariate. For instance, Ama Dablam (6848 m) is compared with Annapurna I (8091 m), which has the highest fatality-to-summit ratio (38%) of all. This should matter more than age: the Aosta guide Abele Blanc climbed Annapurna without oxygen at age 57! More to the point, the (practical) detailed examples do not bring unexpected conclusions, as for instance the fact that runners [thrice alas!] tend to slow down with age.

A geographical comment: Uluru (page 267) is not a city!, but an impressive sandstone monolith in the heart of Australia, a 5 hours drive away from Alice Springs. And historical mentions: Alan Turing (page 10) and the team at Bletchley Park indeed used Bayes factors (and sequential analysis) in cracking the Enigma, but this remained classified information for quite a while. Arianna Rosenbluth (page 10, but missing on page 165) was indeed a major contributor to Metropolis et al.  (1953, not cited), but would not qualify as a Bayesian statistician as the goal of their algorithm was a characterisation of the Boltzman (or Gibbs) distribution, not statistical inference. And David Blackwell’s (page 10) Basic Statistics is possibly the earliest instance of an introductory Bayesian and decision-theory textbook, but it never mentions Bayes or Bayesianism.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]