Archive for Adelaide

Seminal ideas and controversies in Statistics [book review]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on May 24, 2025 by xi'an

CRC Press sent CHANCE this book for review. Since the topic was of clear interest to me, with an author who significantly contributed to the field—my only recollection meeting Roderick Little was during the Australian Statistical Conference in Adelaïde, in 2012, at the start of my Oz 2012 Tour!—, I took the opportunity of the nearest weekend to browse through Seminal ideas and controversies in Statistics. I like very much the idea of selecting a dozen key papers in the history of Statistics and of discussing why. In fact, this reminded me of my classics seminar, which lasted the few years I was 100% in charge of the Master program in Dauphine (and which I hope I could restart!). Checking the list of the papers I then suggested my students, I see some overlap with 9 papers out of the 15 groups. (I also remember Steve Fienberg making suggestions for that list, while he was spending a sabbatical in Paris at CREST.) Given that community of focus and purpose, and contrary to my wont, I have really very little of substance to criticize or wish about the book. The less when reading the following

“On a personal note, I met Yates [author of a 1984 paper on tests for 2×2 contingency tables discussing the relevance of conditioning on one or both margins], a charming man, when I was a young graduate student who knew next to nothing about statistics; we discussed the joys of traversing the Cuillin Ridge in Skye.”

since completing that ridge remains high in my mountain-climbing bucket-list! (Possibly next year, since we are running an ICMS workshop on the Island.)

The first paper in the series is more than a foundational paper since (The) Fisher’s 1922 paper is about creating (almost) ex nihilo the field of (modern) mathematical statistics. I don’t know if there is any equivalence in other scientific disciplines of such an impact (and of such a man)… Roderick Little manages to convincingly engage with Fisher’s dismissive views on (not yet called) Bayesian analysis, although, to the latter’s defence, the formalisation of Bayesian inference at that time had not yet emerged. The second chapter is discussing Yates’ 1984 paper on tests for 2×2 contingency tables that he wrote 50 years after writing the original one in the first volume of JRSS. Roderick Little adds a detailed Bayesian analysis with the three standard reference priors, Jeffreys’ version proving quite close to Fisher’s exact test (conditional on both margins). The third chapter is aiming at the generic challenge of hypothesis testing, from the well-known opposition between Fisher and Neyman (both on the cover), to questioning the sanity of hard-set thresholds (with a mention of our American Statistician call to abandon (shi)p!). The later (thus) refers to the recent literature on the replicability crisis and the now famous ASA statement on p-values by Ron Wasserstein and Nicole Lazar, analysed in the chapter. But I would have like to read another full section on alternatives to hypothesis testing. While now a niche interest (imho), Fisher’s attempt at creating a posterior distribution without a prior, aka fiducial inference, is discussed in Chapter 4 with the Behrens-Fisher problem as the illustrating example. The chapter feels rather anticlimactic, with the comparison relying on the (Malay) Ghosh and Kim (2001) simulation results.

Birnbaum’s (1962) likelihood principle is the topic of Chapter 5 (and I cannot remember any of my students choosing this paper over the years, although there was at least one). Roderick Little recalls some sentences from the JASA discussion as an appetiser, a reminder of the time when these discussions could turn in scathing attacks. The chapter contains excerpts from Berger and Wolpert (1988)—which they were writing while I was spending a year at Purdue and which I have always recommended to my PhD students, albeit not for the classic seminar. It then moves to the controversies that surround this principle since its inception, in particular those accumulated by Deborah Mayo (also on the cover) as reported on the ‘Og. In the recent years, I have become less excited about the LP, in part due to the imprecision in its statement, which opens the door to conflicting interpretations. And in part due to the scarcity of models with non-trivial sufficient statistics. (I am also wondering if the sufficiency issue we highlighted in our ABC model choice criticism does relate to the mixture example at the end of the chapter.)

The next chapter is one all for compromise, through the calibrated Bayes perspective that credible statements should be close to confidence statements in the long run. Which I remember him presenting at ASC 2012. The concept is found in the very 1984 paper by Don Rubin (also on the cover) that contains the concept behind Approximate Bayesian Computation (ABC). And the chapter proceeds by listing strengths and weaknesses of frequentist and Bayesian perspectives, towards a fusion of both., e.g. though posterior predictive checks.

While the choice of a (general public) paper from Scientific American may sound surprising in Chapter 7, with Efron’s (on the cover) and Morris’ 1977 Stein’s paradox, I cannot but applaud, the more because this was the first paper I read when starting my PhD on the James-Stein estimators. Although this may sound like happening eons ago, the James and Stein (1961) paper—which is my age!—”created a considerable backlash” by toppling unbiasedness from its pedestal and exhibiting a paradox that 1+1+1≠3… Which Little reinterprets via a random effect (or Bayesian hierarchical) model. (And a chapter where I learned that Little’s father was a journalist, a characteristic he shared with Bruce Lindsay, as I found at Blonde, Glasgow, during an ICMS workshop). Relatedly, the next chapter is about the “57 varieties [of regression] paper” by Demptster, Schatzoff and Wermuth (1977). Apparently connected with Heinz 57 varieties of pickles. The paper considers Stein and ridge and variable selections versions for variable selection. The chapter also covers (Bayesian) Lasso and BART, as well as a brief all too brief mention of Spike & Slab priors—with my friend Veronika Ročková missing from the authors’ index!—,  but I was expecting from the title other, robust, forms of regression like L¹ regression and econometrics digressions. Chapter 10 can however been seen as a proxy since covering generalized estimating equations from a 1986 Biometrika paper of Liang and Zeger, with no Bayesian aspect (and an expected appearance of Communications in Statistics B).

Chapter 9 covers the almost immediately classic 1995 paper of Benjamini and Hochbeg on multiple regressions (that Series B turned into a discussion paper ten years later!). Although it spends more time on Berry’s (2012) recommendations than on FDR. The computational Chapter 11 brings together Efron’s (1979) bootstrap [with his picture on the cover] and MCMC, represented by the founding paper of Gelfand and Smith (1990, if mistakenly set in 1988 on p140). A bit of a strange mix imho as the former is more inferential than computational. And not giving the EM algorithm that much space. And not questioning MCMC methods as a good proxy to posterior distributions. Tukey’s Future of Data Analysis (as founding exploratory data analysis) and Breiman’s Two cultures (as launching statistical machine learning) meet in Chapter 12. (With a reminder that the latter invokes Occam’s razor—which may not be that appropriate for hugely overparameterised machine learning black boxes—and…the Rashomon principle! Meaning that distinct models may all fit the same data. Let me nitpickingly add the reference to Ryûnosuke Akutagawa as the author of Rashômon and other stories that Kurosawa adapted in his splendid movie). The chapter contains critical remarks from David Cox, Brad Efron, David Bickel, and Andrew Gelman, with a further section on Little’s view on modelling.

The last three chapters are on design and sampling, in connection with Little’s (and Rubin’s) works in the area. With a 1934 paper of Neyman (whose picture on the cover could have been chosen differently, albeit no fault of Neyman [or of Little!] that his toothbrush style of moustache dramatically got out of fashion!). With a return to calibrated Bayes and a reminiscence of Little’s time at the World Fertility Survey but (apparently) no mention of the probabilistic aspects of modern censuses (that saw my friends Steve Fienberg on the one side and Larry Brown and Marty Wells on the other side argue for and against it!), again relating to the reliance on statistical models. Chapter 14 relates randomized clinical trials to causality, which makes a (worthy) appearance there. Roderick Little also makes a clear case there against the retracted study linking vaccines and autism, a call that will unlikely not reach the current Trump administration and its Secretary of Health.

The book concludes with a list of twenty style and grammar suggestions for improved writing.

As should be crystal-clear from the above, I quite enjoyed the book and would definitely use its reading list in a graduate course whenever the opportunity arises. Once again, some choices are more personal to the author than others, and I would have place more emphasis on the fantastic Dawid, Stone and Zidek (1973)—with Jim Zidek also missing from the author index—, but all make sense in a walk through statistical classics. Let me however regret the absence therein of major actors like, e.g., D. Blackwell, C.R. Rao,  or G. Wahba (except in a stylistic example p199), two of whom were awarded the International Prize in Statistics.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE.]

adjustment of bias and coverage for confidence intervals

Posted in Statistics with tags , , , , , , , , on October 18, 2012 by xi'an

Menéndez, Fan, Garthwaite, and Sisson—whom I heard in Adelaide on that subject—posted yesterday a paper on arXiv about correcting the frequentist coverage of default intervals toward their nominal level. Given such an interval [L(x),U(x)], the correction for proper frequentist coverage is done by parametric bootstrap, i.e. by simulating n replicas of the original sample from the pluggin density f(.|θ*) and deriving the empirical cdf of L(y)-θ*. And of U(y)-θ*. Under the assumption of consistency of the estimate θ*, this ensures convergence (in the original sampled size) of the corrected bounds.

Since ABC is based on the idea that pseudo data can be simulated from f(.|θ) for any value of θ, the concept “naturally” applies to ABC outcomes, as illustrated in the paper by a g-and-k noise MA(1) model. (As noted by the authors, there always is some uncertainty with the consistency of the ABC estimator.) However, there are a few caveats:

  • ABC usually aims at approximating the posterior distribution (given the summary statistics), of which the credible intervals are an inherent constituent. Hence, attempts at recovering a frequentist coverage seem contradictory with the original purpose of the method. Obviously, if ABC is instead seen as an inference method per se, like indirect inference, this objection does not hold.
  • Then, once the (umbilical) link with Bayesian inference is partly severed, there is no particular reason to stick to credible sets for [L(x),U(x)]. A more standard parametric bootstrap approach, based on the bootstrap distribution of θ*, should work as well. This means that a comparison with other frequentist methods like indirect inference could be relevant.
  • At last, and this is also noted by the authors, the method may prove extremely expensive. If the bounds L(x) and U(x) are obtained empirically from an ABC sample, a new ABC computation must be associated with each one of the n replicas of the original sample. It would be interesting to compare the actual coverages of this ABC-corrected method with a more direct parametric bootstrap approach.

a paradox in decision-theoretic interval estimation (solved)

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on October 4, 2012 by xi'an

In 1993, we wrote a paper [with George Casella and Gene/Juinn Hwang] on the paradoxical consequences of using the loss function

\text{length}(C) - k \mathbb{I}_C(\theta)

(published in Statistica Sinica, 3, 141-155) since it led to the following property: for the standard normal mean estimation problem, the regular confidence interval is dominated by the modified confidence interval equal to the empty set when is too large… This was first pointed out by Jim Berger and the most natural culprit is the artificial loss function where the first part is unbounded while the second part is bounded by k. Recently, Paul Kabaila—whom I met in both Adelaide, where he quite appropriately commented about the abnormal talk at the conference!,  and Melbourne, where we met with his students after my seminar at the University of Melbourne—published a paper (first on arXiv then in Statistics and Probability Letters) where he demonstrates that the mere modification of the above loss into

\dfrac{\text{length}(C)}{\sigma} - k \mathbb{I}_C(\theta)

solves the paradox:! For Jeffreys’ non-informative prior, the Bayes (optimal) estimate is the regular confidence interval. besides doing the trick, this nice resolution explains the earlier paradox as being linked to a lack of invariance in the (earlier) loss function. This is somehow satisfactory since Jeffreys’ prior also is the invariant prior in this case.

ASC 2012 (#3, also available by mind-reading)

Posted in Running, Statistics, University life with tags , , , , , , , , , on July 13, 2012 by xi'an

This final morning at the ASC 2012 conference in Adelaide, I attended a keynote lecture by Sophia Rabe-Hesketh on GLMs that I particularly appreciated, as I am quite fond of those polymorphous and highly adaptable models (witness the rich variety of applications at the INLA conference in Trondheim last month). I then gave my talk on ABC model choice, trying to cover the three episodes in the series within the allocated 40 minutes (and got from Terry Speed the trivia information that Renfrey Potts, father to the Potts model, spent most of his life in Adelaide, where he died in 2005! Terry added that he used to run along the Torrens river, being a dedicated marathon runner. This makes Adelaide the death place of both R.A. Fisher and R. Potts.)

Later in the morning, Christl Donnelly  gave a fascinating talk on her experiences with government bodies during the BSE and foot-and-mouth epidemics in Britain in the past decades. It was followed by  a frankly puzzling [keynote Ozcots] talk delivered by Jessica Utts on the issue of parapsychology tests, i.e. the analysis of experiments testing for “psychic powers”. Nothing less. Actually, I first thought this was a pedagogical trick to capture the attention of students and debunk, however Utts’ focus on exhibiting such “powers” was definitely dead serious and she concluded that “psychic functioning appears to be a real effect”. So it came as a shock that she was truly believing in psychic paranormal abilities! I had been under the wrong impression that the 2005 Statistical Science paper of hers was demonstrating the opposite but it clearly belongs to the tradition of controversial Statistical Science that started with the Bible code paper… I also found it flabbergasting to learn that the U.S. Army is/was funding research in this area and is/was actually employing “psychics”, as well that the University of Edinburgh has a parapsychology unit within the department of psychology. (But, after all, UK universities also have long had schools of Divinity, so let the irrational in a while ago!) Continue reading

ACS 2012 (#2)

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on July 12, 2012 by xi'an

This morning, after a nice and cool run along the river Torrens amidst almost unceasing bird songs, I attended another Bayesian ASC 2012 session with Scott Sisson presenting a simulation method aimed at correcting for biased confidence intervals and Robert Kohn giving the same talk in Kyoto. Scott’s proposal, which is rather similar to parametric bootstrap bias correction, is actually more frequentist than Bayesian as the bias is defined in terms of an correct frequentist coverage of a given confidence (or credible) interval. (Thus making the connection with Roderick Little’s calibrated Bayes talk of yesterday.) This perspective thus perceives ABC as a particular inferential method, instead of a computational approximation to the genuine Bayesian object. (We will certainly discuss the issue with Scott next week in Sydney.)

Then Peter Donnely gave a particularly exciting and well-attended talk on the geographic classification of humans, in particular of the (early 1900’s) population of the British isles, based on a clever clustering idea derived from an earlier paper of Na Li and Matthew Stephens: using genetic sequences from a group of individuals, each individual was paired with the rest of the sample as if it descended from this population. Using an HMM model, this led to clustering the sample into about 50 groups, with a remarkable geographic homogeneity: for instance, Cornwall and Devon made two distinct groups, an English speaking pocket of Wales (Little England) was identified as a specific group and so on, the central, eastern and southern England constituting an homogenous group of its own…