Archive for PRNG

handbook of sharing confidential data [book review]

Posted in Statistics with tags , , , , , , , , , , , , , on March 12, 2025 by xi'an

A new Chapman & Hall handbook appeared on the most current issue of confidentiality and privacy, which has been edited by Jörg Drechsler, Daniel Kifer, Jerome Reiter, and Aleksandra Slavković. The forty authors of the 18 chapters are mostly from the U.S., with a few outliers from Edinburgh (involved in two chapters on protecting the Scottish Longitudinal Study and the U.S. IRS tax data) and Tallinn (for a chapter on secure multi-party computation applications). This means a more U.S. centric focus for realistic implementations as, e.g., with the Census Bureau (which employs 25% of the authors), than those implied by EU regulations, for instance.

Overall, I enjoyed reading these chapters and would certainly use the book as a first entry to a graduate course on privacy (as opposed to some books I recently reviewed). The first two chapters are 100% formula-free and thus more surveys than informative entries to the field, imho. The following Part II on formal privacy techniques covers the expected standards of differential privacy, local vs. global design, single vs. multiple queries, consequence on learning machines and statistical procedures. Concerning Bayesian aspects, Chapter 7 about private machine learning has two paragraphs on the privacy properties of MCMC algorithms albeit not exposing clearly enough that privacy vanishes as the number of iterations grows to infinity. Chapter 8 concentrates on statistical differential privacy, much along my own perception of the requirements for a genuine statistical approach, with Bayesian aspects not sidelined. If less critical of differential privacy than I. Chapter 9 focusses on system issues, investing a dozen pages into the specifics of pseudo-random generators. Part III is about synthetic data, with some overlap between the first two chapters. (I would deem DP need not be introduced by Chapter 12.) I find the section rather superficial, mostly formula free, and lacking in the statistical impact.

As an aside, I am disappointed at the poor rendering of (mathematical) equations making me wonder which type of LaTeX, if any, was used. There are even genuine typos  that seem to result from cut and past encoding errors (see, e.g., the final accentuated c of Sklavković). The reference lists are plentiful, see e.g. the 164 entries for Chapter 7, to the point it would have made more sense to regroup them into a single bibliography. (The predictable reply being that chapters are sold separately and need their respective reference lists.)

[Disclaimer about potential self-plagiarism: this post or an edited version of it could possibly appear in my Books Review section in CHANCE.]

truly and uniquely pseudo-random?

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on July 10, 2024 by xi'an

I came across a Web paper entitled The Impact of Google’s Random Number Generator on Accurate Results that I found most puzzling in failing to explain the specific nature of this random generator and that included gems as below

“PRNGs (…) outputs are inherently predictable given a specific seed value, thus deviating from true randomness”

or stating that linear congruential generators suffered from “discernible patterns”, or yet advancing a bonus in using Google’s generator for “leveraging Gaussian distributions”… Referring to a specific and possibly obscure arXival proposing to construct (yet) a PRNG by reinforced learning, with an objective function based on randomness tests reminded me of Marsaglia’s Die Hard. And to “true randomness”. Plus being repetitive and obsequious. Until I found the piece was “written” by Quthor with no typo, an AI “writer” powered by Quick Creator. I should have paid more attention to the level of nonsense…

On the same occurrence, I also read a recent Scientific American paper on randomness tests pseudo-random number generators by Christopher Lutsko. Which does not start that well, since it reproduces the Laplace démon’s argument that a die roll outcome is not random. Also repeating the above pleonasm that PRNGs are not random since they are deterministic. And failing to not mention lava generators! The core of the article is however about recent papers by the author and Niclas Technau of the only examples of sequences that prove passed extremely strong pseudorandom tests. They focus on tests based on gap distribution and pair correlation.

“The [gap distribution] measures the size of the gaps between the points, and the [pair correlation’ measures the clustering of the points—how much they group up or stay apart (…) If these agree with what we would expect from random data, we say that the gap distribution or pair correlation is “Poissonian”.”  Christopher Lutsko

Along with Athanasios Sourmelidis, they proved that exponential sequences {α exp(θ log[n]) mod 1} have Poissonian pair correlation when θ<⅓, for any value of α. And higher Poissonian correlations for θ “smaller and smaller“. (This does not come out of the blue, as the proof relates to Van der Corput’s method of exponential sums, with links to the leading Austrian random generator community.) This is definitely an achievement. However, when θ gets “smaller and smaller“, the terms in the sequence grow more and more slowly, which forces calling for larger and larger values of α to make the generator useful. And the pseudo-randomness test based on Poissonian correlations is only one of many from the toolbox, so the debate seems far from over.

 

 

laser sharp random number generator

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on April 1, 2021 by xi'an

Caught the headline of Science News on a super-fast random number generator based on a dysfunctional laser! Producing “254 trillion random digits per second”.

“…when the laser is shined on a surface, its light contains a constantly changing pattern of tiny pinpricks that brighten and dim randomly. The brightness at each spot in the pattern over time can be translated by a computer into a random series of ones and zeros.”

I presume this is covered in the original Science paper [which I cannot access] but the parallel series of 0’s and 1’s should be checked to produce independent Bernoulli B(½) variates before being turned into a genuine random number generator.

inverse Gaussian trick [or treat?]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , , , , on October 29, 2020 by xi'an

When preparing my mid-term exam for my undergrad mathematical statistics course, I wanted to use the inverse Gaussian distribution IG(μ,λ) as an example of exponential family and include a random generator question. As shown above by a Fortran computer code from Michael, Schucany and Haas, a simple version can be based on simulating a χ²(1) variate and solving in x the following second degree polynomial equation

\dfrac{\lambda(x-\mu)^2}{\mu^2 x} = v

since the left-hand side transform is distributed as a χ²(1) random variable. The smallest root x¹, less than μ, is then chosen with probability μ/(μ+x¹) and the largest one, x²=μ²/x¹ with probability x¹/(μ+x¹). A relatively easy question then, except when one considers asking for the proof of the χ²(1) result, which proved itself to be a harder cookie than expected! The paper usually referred to for the result, Schuster (1968), is quite cryptic on the matter, essentially stating that the above can be expressed as the (bijective) transform of Y=min(X,μ²/X) and that V~χ²(1) follows immediately. I eventually worked out a proof by the “law of the unconscious statistician” [a name I do not find particularly amusing!], but did not include the question in the exam. But I found it fairly interesting that the inverse Gaussian can be generating by “inverting” the above equation, i.e. going from a (squared) Gaussian variate V to the inverse Gaussian variate X. (Even though the name stems from the two cumulant generating functions being inverses of one another.)

random generators produce ties

Posted in Books, R, Statistics with tags , , , , , , , on April 21, 2020 by xi'an

“…an essential part of understanding how many ties these RNGs produce is to understand how many ties one expects in 32-bit integer arithmetic.”

A sort of a birthday-problem paper for random generators by Markus Hofert on arXiv as to why they produce ties. As shown for instance in the R code (inspired by the paper):

sum(duplicated(runif(1e6)))

returning values around 100, which is indeed unexpected until one thinks a wee bit about it… With no change if moving to an alternative to the Mersenne twister generator. Indeed, assuming the R random generators produce integers with 2³² values, the expected number of ties is actually 116 for 10⁶ simulations. Moving to 2⁶⁴, the probability of a tie is negligible, around 10⁻⁸. A side remark of further inerest in the paper is that, due to a different effective gap between 0 and the smallest positive normal number, of order 10⁻²⁵⁴ and between 1 and the smallest normal number greater than 1, of order 10⁻¹⁶, “the grid of representable double numbers is not equidistant”. Justifying the need for special functions such as expm1 and log1p, corresponding to more accurate derivations of exp(x)-1 and log(1+x).