Archive for Thames

THAMES for mixtures, a reply from the authors

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , on June 23, 2025 by xi'an

[Here is a reply to my comments on THAMES sent by the first author of the paper, Martin Metodiev. The above replica of the cover of Rivers of London is obviously unrelated with the reply or the original blog, beyond presenting a fantasy map of the Thames!]

Thank you for your review of our article! Adapting your previous work in this field has been a pleasure. Before I respond to your comments, I would like to emphasize that the simplicity of our estimator lies in its simple analytic expression (a truncated harmonic mean of reciprocal unnormalized posterior density values). Indeed, our package “thamesmix” (recently submitted to CRAN!) has a function to compute the marginal likelihood of any mixture model. This function requires only two parameters: the unnormalized log-posterior function (the logarithm of the prior plus the log-likelihood) and the MCMC simulations from the posterior.

Regarding your main comments:

1. “the evacuation of earlier methods as not simple or not universal enough is rather disingenuous. For instance, software that do not return (latent) allocation vectors can easily be post-processed.”

I could not find an example of post-process simulations on top of MCMC outputs applied to compute these methods. It sounds really interesting, and I would be happy to cite it. Is there a reference that you can recommend?

In any case, the point still stands. Most estimators which we cite with regards to this point do not just need allocation samplers, but also the analytic expressions of the distribution of the allocation vectors or the distribution of the data conditional on these allocation vectors that come with them. I do not think that a closed form of this distribution is available in general.

2.“the handling of the label switching issue—the reason why Larry Wasserman saw mixtures at the same magnitude of evil as tequila!—is problematic for several reasons.”

The fact that our estimator is invariant to label-switching is indeed the core of our method. The simple Gibbs sampler gets stuck in one mode, and this is why the classical version of bridge sampling is biased by a factor of G! in the simulation setting. As you point out, this is successfully resolved when using fully symmetric bridge sampling in the experiment section. However, the computation cost of this fully symmetric estimator rises super-exponentially with G, so I do not see how it could be evaluated for G=15, where the number of symmetric modes is equal to 15! (over one trillion). One of the main points of our article is that the symmetric THAMES can be evaluated in a feasible amount of time, even in such a high-dimensional multivariate setting.

3. “the (legitimate) purpose of using marginal likelihoods for selecting the number G of components is weakened by the intrusion of alternate proposals to assess G from the data”

I would like to point out that these alternate proposals do not in any way impact the definition of the THAMES. It is the simple definition given in Equation (5). They are only used to speed up the computation.

4. “several mentions are made of the other estimators being biased, which is indeed the case for bridge sampling (if not necessarily for importance sampling), but not necessarily a central issue”

The problem that we see with the classical, non-symmetric bridge sampling method in the setting of mixture models is not simply that it is biased. The problem is that the bias is persistent and often roughly equal to the factor of G! when the MCMC sampler failed to switch between modes. We have not had this experience with the THAMES: it converged even when the MCMC was stuck.

easily computed marginal likelihoods for multivariate mixture models using the THAMES estimator

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on May 25, 2025 by xi'an

Martin Metodiev and his coauthor(es)s have produced another paper on the THAMES Monte Carlo method when specifically targetting marginal likelihoods for mixture models. Since this problem has long been a central interest of mine’s and since the method is closely connected with the harmonic mean solution we developed with Darren Wraith in 2009, (and also included in our 2009 survey with Jean-Michel Marin of evidence approximations, published in Frontiers of Statistical Decision Making and Bayesian Analysis for Jim Berger’s 60th birthday), I quickly went into the paper. The core purpose of this paper is to adapt THAMES to a multimodal setting since using an ellipsoidal region as the support of the Uniform reciprocal importance sampling distribution does not make sense for a multimodal target. After reading it a few times, and while some computational aspects remain obscure to me, I am not convinced this brings an adequate answer to the challenge.  Indeed, while the approach borrows directly from Berkhof et al. (2003) that inspired the resolution we proposed, Jeong (Kate) Lee and myself, the issues I have with the current proposal are that

1. the evacuation of earlier methods as not simple or not universal enough is rather disingenuous. For instance, software that do not return (latent) allocation vectors can easily be post-processed. And the current method uses allocation probabilities just the same (in Section 3.3). Similarly, the random shuffling answer to label (lack of) switching proposed by Sylvia Früwirth-Schnatter—which again can be achieved by post-processing—cannot be rejected on the sole basis that the component means (based on the MCMC sample) are all similar. It is furthermore debatable that the current proposal is simple, when involving relabelling à la Stephens, averaging over permutations, selecting over said permutations by constructing a graph over components (section 3.2.1) and  running a quadratic discriminant analysis (section 3.2.2) on the posterior sample, based on an arbitrary Normal representation of the distributions of the clusters, and finally defining a new ordering constraint (section 3.2.3). Computing efforts  required by the respective methods do not appear in the main text.

2. the handling of the label switching issue—the reason why Larry Wasserman saw mixtures at the same magnitude of evil as tequila!—is problematic for several reasons. My position (since at least 2000!) on the matter is that the proper posterior sample must exhibit label switching and come close to symmetry among the “components”. The label switching problem (section 3.1) is rather when the MCMC sample does not “switch their labels”. The relabelling approach (e.g., à la Stephens) allows for a differentiation between components, to some extent, which helps with computing basic posterior moments for point estimation or for the calibration of the support of the Uniform reciprocal importance sampling distribution, but the use of any relabelling procedure is tampering with the original MCMC sample and thus bound to impact the distribution of the resulting relabelled sample. Furthermore, relabelling depends on the value of G, whereas the actual number of (significant) modes in the posterior is also connected with the (partial) fit of the data to the model, meaning the creation of further modes than those linked with relabelling. Especially when the model is misspecified. Incidentally, the symmetrised version of THAMES (5) does not require relabelling. Neither does the Bayes factor. In addition, the experiment section (4.1.2) mentions that bridge sampling is biased by a factor of G!, which comes as a surprise to me since I associated this factor with the call to Sid Chib’s formula in the absence of label switching, i.e. when the MCMC sample was stuck on a mode, as exposed by Radford Neal in 1999. Is it because bridge sampling is applied to the relabelled sample? It is also surprising that the gap appears in the simulated datasets (Fig.3) and not in the real ones (Fig.5).

3. the (legitimate) purpose of using marginal likelihoods for selecting the number G of components is weakened by the intrusion of alternate proposals to assess G from the data, like the criterion of overlap (section 3.2.1), which instead aims at the number of clusters, with an elimination of “empty components” that should either remain a possibility (within a regular mixture model) or be evacuated with a different modelling (à la Diebolt & Robert, or à la Wasserman). This overlapping criterion is further used in the discriminant analysis that only applies to “non-overlapping components” of the mixture (section 3.2.3)—at which point I got lost in the reordering and simplification of the computation of THAMES (but got reminded of the results of Agostino Nobile in the 2000’s, with whom I used to discuss a lot in my yearly visit to the University of Glasgow).

4. several mentions are made of the other estimators being biased, which is indeed the case for bridge sampling (if not necessarily for importance sampling), but not necessarily a central issue, while the original generalised harmonic proposal by Gelfand and Dey (1994) and thus THAMES produce an unbiased estimator of the inverse of the evidence (thus neither of the evidence nor of the log-evidence). However, in the paper, the volume of the support of the Uniform reciprocal importance sampling distribution is estimated by a basic Monte Carlo coverage probability in (3), which induces the same type of bias as the other methods.

a day for social good in London

Posted in pictures, Travel, University life with tags , , , , , , , , , , , , on September 16, 2023 by xi'an


On Thursday, I went (from Paris) to London for the final day of Warwick DSSGx UK 2023 (Data Science for Social Good), which took place in The Shard, the tallest building in London (and certainly not the prettiest!), where the Warwick Business School has offices, and on the way, stopped by the nearby Tate Modern museum

where I spotted a few interesting pieces of art (but not that many!)

before heading to the 17th floor of the Shard building
And celebrating the great work done by the 16 Fellows (with the support of their fantastic technical mentors and project managers). Unfortunately, two of them were denied student visas and had to work remotely.The four projects were about predicting deforestation (in the Brazilian Amazon), failure to join a sixth form college (in the UK), risk of turning NEET (in the UK), and greenwashing (from social media).

We are now starting planning the next cohort, hence looking for candidates, projects, and financial support. Contact dssg@wbs.ac.uk for expressions of interest.

reciprocal importance sampling

Posted in Books, pictures, Statistics with tags , , , , , , , , , on May 30, 2023 by xi'an

In a recent arXival, Metodiev et al. (including my friend Adrian Raftery, who is spending the academic year in Paris) proposed a new version of reciprocal importance sampling, expanding the proposal we made with Darren Wraith (2009) of using a Uniform over an HPD region. It is called THAMES, hence the picture (of London, not Paris!), for truncated harmonic mean estimator.

“…[Robert and Wraith (2009)] method has not yet been fully developed for realistic, higher-dimensional situations. For example, we know of no simple way to compute the volume of the convex hull of a set of points in higher dimensions.”

They suggest replacing the convex hull of the HPD points with an ellipsoid ϒ derived from a Normal distribution centred at the highest of the HPD points, whose covariance matrix is estimated from the whole (?) posterior sample. Which is somewhat surprising in that this ellipsoid may as well included low probability regions when the posterior is multimodal. For instance, the estimator is biased when the posterior cancels on parts of ϒ. And with an unclear fate for the finiteness of its variance, depending on how fast the posterior gets to zero on these parts.

The central feature of the paper is selecting the radius of the ellipse that minimises the variance of the (counter) evidence. Under asymptotic normality of the posterior. This radius roughly corresponds to our HPD region in that 50% of the sample stands within. The authors also notice that separate samples should be used to estimate the ellipse and to estimate the evidence. And that a correction is necessary when the posterior support is restricted. (Examples do not include multimodal targets, apparently.)

la belle sauvage [book review]

Posted in Statistics with tags , , , , , , , , , , , , , , , on February 25, 2018 by xi'an

Another book I brought back from Austin. And another deeply enjoyable one, although not the end of a trilogy of trilogies this time. This book, La Belle Sauvage, is first in a new trilogy by Philip Pullman that goes back to the early infancy of the hero of His Dark Materials, Lyra. Later volumes will take place after the first trilogy.

This is very much a novel about Oxford, to the point it sometimes seems written only for people with an Oxonian connection. After all, the author is living in Oxford… (Having the boat of the two characters passing by the [unnamed] department of Statistics at St. Giles carried away by the flood was a special sentence for me!)

Also, in continuation of His Dark Materials, a great steampunk universe, with a very oppressive Church and so far a limited used of magicks! Limited to the daemons, again in continuation with past volumes…

Now, some passages of the book remind me of Ishiguro’s buried giant, in the sense that the characters meeting myths from other stories may “really” meet them or instead dream. This is for instance the case when they accost at a property where an outworldy party is taking place and no-one is noticing them. Or when they meet a true giant that is a river deity, albeit not in the spirit of Ben Aaronovitch’s Rivers of London novels.

The story is written in the time honoured setup of teenager discovery travels, with not so much to discover as the whole country is covered by water. And the travel gets a wee bit boring after a while, with a wee bit too many coincidences, the inexplicable death (?) of a villain, and an hurried finale, where the reverse trip of the main characters takes a page rather than one book…

Trivia: La Belle Sauvage was also the name of the pub in Ludgate Hill where Pocahontas and her brother Tomocomo stayed when they first arrived in London. And The Trout is a true local pub, on the other side of Port Meadow [although I never managed to run that far in that direction while staying in St. Hugh, Oxford, last time, the meadow being flooded!].

Looking forward the second volume (already written, so no risk of The Name of the Wind or Game of Thrones quagmires, i.e., an endless wait for the next volume!), hoping the author keeps up the good work, the right tension in the story, and avoids by all means parallel universes, which were so annoying in the first trilogy! (I do remember loosing interest in the story during the second book and having trouble finishing the third one. I am not sure my son [who started before me] ever completed the trilogy…)