importance sampling and independent Metropolis–Hastings with unbounded weights

George Deligiannidis, Pierre E. Jacob, El Mahdi Khribch, and Guanyang Wang just arXived a paper on the respective behaviours of importance sampling and independent Metropolis–Hastings (IMH) under the same proposal when the importance weight is unbounded but enjoys a p-th moment with p≥2. Both algorithms are sharing a lot, with importance sampling appearing as a rough Rao-Blackwellisation of Metropolis-Hastings, and its asymptotic variance being smaller than the asymptotic variance of Metropolis-Hastings. I was unable to check whether or not their conditions encompassed the highly interesting case when the integrand f is integrable under the target π, but not L²(π). (Theorem 2.3 does not seem to include this case.)

They consider a particular (!) version of Metropolis–Hastings (IMH) under the same proposal when the importance weight is unbounded but enjoys a p-th moment. Both algorithms are sharing a lot, with importance sampling appearing when N iid proposed values are drawn at once and accepted or rejected (again at once) with an acceptance ratio the average of the weights. Although this is already found in a 2010 paper by Christophe Andrieu and co-authors, and stem from an unbiased importance sampler, I was not aware of this version. My initial feeling (predictably) was pessimistic, but thinking about it, using the average weight brings into the sample simulations with small weights that would otherwise be discarded. Of course, a rejection proves N times more costly. But this is truly a form of Rao-Blackwellisation in the sense that it removes the weight variability to some extent (see p5) and it turns the outcome into an unbiased estimator. Despite the self-normalising behaviour! They also conclude that the rejection probability is at least c/√N  on average (Remark 4.1).

“We show that the bias of self-normalized importance sampling is of order N 1, and we obtain new bounds on the moments of the error in importance sampling. We then consider IMH, and show that the common random numbers coupling is optimal. Using this coupling, we show that the total variation distance between IMH at iteration t and π decays as tp-1.”

They also compare the biases in sampling importance resampling and independent Metropolis–Hastings, with the later getting the upper hand, but I do not see the justification in resampling when computing an integral. Since this does not a sample from the target, especially when the weights are unbounded, and adds to the variability of the estimator. They further propose a (telescopic) unbiased modification to the self-normalised importance sampling estimator, with an inefficiency twice as high. But a neat Rao-Blackwellisation trick brings it back to the same level!

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.