I was asked this question in an interview.
Given a random number generator to generate a number between [0,N), how to prove this number is uniform distributed.
I am not sure how to approach this problem, any suggestion?
To be precise, the random numbers are uniformly distributed, which means that all numbers in the range are equally probable, and greater than or equal to zero and less than one. It doesn't matter whether the numbers are true hardware-generated random values or a pseudorandom sequence generated by a computer.
Use rand to generate 1000 random numbers from the uniform distribution on the interval (0,1). rng('default') % For reproducibility u = rand(1000,1); The inversion method relies on the principle that continuous cumulative distribution functions (cdfs) range uniformly over the open interval (0,1).
The frequency test is a test of uniformity. Two different methods available, Kolmogorov-Smirnov test and the chi-square test. Both tests measure the agreement between the distribution of a sample of generated random numbers and the theoretical uniform distribution.
The random number generators we have considered simulate the uniform distribution on a discrete set of integers between 0 and some large value L. If the integer x is generated as the discrete uniform variable, then U = x/L could be the delivered continuous uniform variable.
To prove it, you need to know the algorithm being used and show in graph terms that the set of all states constitutes a cycle, that there are no subcycles, and that the cardinality of the state space modulo N is zero so that there is no set of states that occur more/less frequently than others. This is how we know that Mersenne Twister, for instance, is uniformly distributed even though the 64 bit version has a cycle length of 219937-1 and could never be enumerated within the lifetime of the universe.
Otherwise you use statistical tests to test the hypothesis of uniformity. Statistics can't prove a result, it fails to disprove the hypothesis. The larger your sample size is, the more compelling the failure to disprove a hypothesis is, but it is never proof. (This perspective causes more communications problems with non-statisticians/non-scientists than anything else I know.) There are many tests for uniformity, including chi-square tests, Anderson-Darling, and Kolmogorov-Smirnov to name just a few.
All of the uniformity tests will pass sequences of values such as 0,1,2,...,N-1,0,1,... so uniformity is not sufficient to say you have a good generator. You should also be testing for serial correlation with tests such as spacings tests, runs-up/runs-down, runs above/below the mean, "birthday" tests, and so on.
A pretty comprehensive suite of tests for uniformity and serial correlation was created by George Marsaglia over the course of his career, and published in 1995 as what he jokingly called the "Diehard tests" (because it's a heavy duty battery of tests).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With