Archive for spacings

uniform spacings

Posted in Books, Kids, R with tags , , , , on June 8, 2023 by xi'an

A riddle on uniform spacings!, namely when considering eight iid Uniform (0,1) variates as visiting times and three further iid Uniform (0,1) variates as server availability times, with unit service time, the question being the probability a server is available for a ninth visiting time, T⁹. Which can be decomposed into four cases:

  1. at least one server becomes available between the previous visiting time and T⁹
  2. at least two servers become available between the penultimate visiting time and the previous visiting time
  3. two servers become available between the antepenultimate visiting time and penultimate visiting time and one server becomes available between the penultimate visiting time and the previous visiting time
  4. three servers become available between the antepenultimate visiting time and the penultimate visiting time

with respective probabilities (using Devroye’s Theorem 2.1)

1/4, 8 9!3!/12!, 7 8!3!/12!, 7 8!3!/12!,

resulting in a total probability of 0.2934, compatible with a simulation assessment.

a chain of collapses

Posted in Books, Kids, R with tags , , , on June 20, 2018 by xi'an

A quick riddler resolution during a committee meeting (!) of a short riddle: 36 houses stand in a row and collapse at times t=1,2,..,36. In addition, once a house collapses, the neighbours if still standing collapse at the next time unit. What are the shortest and longest lifespans of this row?

Since a house with index i would collapse on its own by time i, the longest lifespan is 36, which can be achieved with the extra rule when the collapsing times are perfectly ordered. For the shortest lifespan, I ran a short R code implementing the rules and monitoring its minimum. Which found 7 as the minimal number for 10⁵ draws. However, with an optimal ordering, one house plus one or two neighbours of the most recently collapsed, leading to a maximal number of collapsed houses after k time units being

1+2(k-1)+1+2(k-2)+….=k+k(k-1)=k²

which happens to be equal to 36 for k=6. (Which was also obtained in 10⁶ draws!) This also gives the solution for any value of k.

maximal spacing around order statistics [#2]

Posted in Books, R, Statistics, University life with tags , , , , , , , , on June 8, 2018 by xi'an

The proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of

\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|

when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of independence between the absolute differences. Which does not hold, as shown by the above correlation matrix (plotted via corrplot) for N=11 and 10⁴ simulations. One could think that this correlation decreases with N, but it remains essentially 0.2 for larger values of N. (On the other hand, the minima are essentially independent.)

maximal spacing around order statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2018 by xi'an

The riddle from the Riddler for the coming weeks is extremely simple to express in mathematical terms, as it summarises into characterising the distribution of

\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|

when the n-sample is made of iid Normal variates. I however had a hard time finding a result connected with this quantity since most available characterisations are for either Uniform or Exponential variates. I eventually found a 2017 arXival by Nagaraya et al.  covering the issue. Since the Normal distribution belongs to the Gumbel domain of attraction, the extreme spacings, that is the spacings between the most extreme orders statistics [rescaled by nφ(Φ⁻¹{1-n⁻¹})] are asymptotically independent and asymptotically distributed as (Theorem 5, p.15, after correcting a typo):

(\xi_1,\xi_2/2,...)

where the ξ’s are Exp(1) variates. A crude approximation is thus to consider that the above Δ is distributed as the maximum of two standard and independent exponential distributions, modulo the rescaling by  nφ(Φ⁻¹{1-n⁻¹})… But a more adequate result was pointed out to me by Gérard Biau, namely a 1986 Annals of Probability paper by Paul Deheuvels, my former head at ISUP, Université Pierre and Marie Curie. In this paper, Paul Deheuvels establishes that the largest spacing in a normal sample, M¹, satisfies

\mathbb{P}(\sqrt{2\log\,n}\,M^1\le x) \to \prod_{i=1}^{\infty} (1-e^{-ix})^2

from which a conservative upper bound on the value of n required for a given bound x⁰ can be derived. The simulation below compares the limiting cdf (in red) with the empirical cdf of the above Δ based on 10⁴ samples of size n=10³.The limiting cdf is the cdf of the maximum of an infinite sequence of independent exponentials with scales 1,½,…. Which connects with the above result, in fine. For a practical application, the 99% quantile of this distribution is 4.71. To achieve a maximum spacing of, say 0.1, with probability 0.99, one would need 2 log(n) > 5.29²/0.1², i.e., log(n)>1402, which is a pretty large number…

 

spacings on a torus

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on March 22, 2018 by xi'an

While in Brussels last week I noticed an interesting question on X validated that I considered in the train back home and then more over the weekend. This is a question about spacings, namely how long on average does it take to cover an interval of length L when drawing unit intervals at random (with a torus handling of the endpoints)? Which immediately reminded me of Wilfrid Kendall (Warwick) famous gif animation of coupling from the past via leaves covering a square region, from the top (forward) and from the bottom (backward)…

The problem is rather easily expressed in terms of uniform spacings, more specifically on the maximum spacing being less than 1 (or 1/L depending on the parameterisation). Except for the additional constraint at the boundary, which is not independent of the other spacings. Replacing this extra event with an independent spacing, there exists a direct formula for the expected stopping time, which can be checked rather easily by simulation. But the exact case appears to be add a few more steps to the draws, 3/2 apparently. The following graph displays the regression of the Monte Carlo number of steps over 10⁴ replicas against the exact values: