
As I have been aiming at mentioning this news for quite a while, David Frazier, Ryan Kelly, Christopher Drovandi, and David Warne arXived last November a paper that parallels our paper (with David and Gael) on ABC consistency and some earlier papers of theirs for synthetic likelihood in the case of neural posterior approximations, under similar conditions (see, e.g., Assumptions 1 and 2), with potential reduced computational cost in some situations.
“NLE requires additional MCMC steps to produce a posterior approximation, whereas NPE produces a posterior approximation directly and does not require any additional sampling”
Convergence is achieved when the neural learning size grows fast enough with the sample size. And when the tolerance decreases fast enough with respect to the convergence rate of the summary statistic. Two options are possible, that is either approximating the likelihood and then exploiting this approximation in an MCMC algorithm, or directly approximating the posterior distribution, as a function of of the summary statistic Sn (rather than for the observed S⁰n), with arguments favouring the second option.
“if the intractable posterior Π(· | Sn) is asymptotically Gaussian a nd calibrated, then so long as νnγN = o(1), the NPE is also asymptotically Gaussian and calibrated”
where γN denotes the rate at which the neural approximation of the posterior converges to the ideal posterior (for the Kullback-Leibler divergence) in N the size of the learning sample. And νn is the rate of convergence of the statistic Sn to its asymptotic mean. The convergence result does not make explicit assumptions on the class of neural posteriors, but it requires that the observed statistic must fit within the range of the simulated values (a possibility illustrated in the paper with an MA(2) model that was already used in several of our papers (as I noticed when giving an ABC masterclass in Warwick this very week).
“While neural methods and normalizing flows are common choices for the approximating class Q, the diversity of such methods, along with their complicated tuning and training regimes, makes establishing theoretical results on the rate of convergence, γN, difficult”
Under stronger and hard to check assumptions, namely on the minimaxity of the posterior density estimator within the class of locally β-Hölder functions, they recover a closed form γN . Which unravels how N should be chosen (with a surprising addition of the dimensions of the parameter θ and of the summary Sn. With a resulting explosion in the theoretical minimal value of N one should use. (And decent performances of the method with smaller values of N!) Concerning minimaxity, I have no intuition how this impacts the sparseness (lack thereof) of the neural networks that can be used.
I am wondering at strategies to remove superfluous statistics since their dimension matters so much and in detecting or evaluating the misspecification (or its complement, the compatibility, as discussed on page 31). But all in all this paper represents a massive addition to the consistency results for approximate Bayesian inference methods!


