How will you test if the random number generator is generating actual random numbers?
My Approach: Firstly build a hash of size M, where M is the prime number. Then take the number generated by random number generator, and take mod with M. and see it fills in all the hash or just in some part. That's my approach. Can we prove it with visualization?
Since I have very less knowledge about testing. Can you suggest me a thorough approach of this question? Thanks in advance
You should be aware that you cannot guarantee the random number generator is working properly. Note that even a perfect uniform distribution in range [1,10] - there is a 10-10 chance of getting 10 times 10 in a random sampling of 10 numbers.
Is it likely? Of course not.
So - what can we do?
We can statistically prove that the combination (10,10,....,10) is unlikely if the random number generator is indeed uniformly distributed. This concept is called Hypothesis testing. With this approach we can say "with certainty level of x% - we can reject the hypothesis that the data is taken from a uniform distribution".
A common way to do it, is using Pearson's Chi-Squared test, The idea is similar to yours - you fill in a table - check what is the observed (generated) number of numbers for each cell, and what is the expected number of numbers for each cell under the null hypothesis (in your case, the expected is k/M
- where M is the range's size, and k is the total number of numbers taken).
You then do some manipulation on the data (see the wikipedia article for more info what this manipulation is exactly) - and get a number (the test statistic). You then check if this number is likely to be taken from a Chi-Square Distribution. If it is - you cannot reject the null hypothesis, if it is not - you can be certain with x% certainty that the data is not taken from a uniform random generator.
EDIT: example:
You have a cube, and you want to check if it is "fair" (uniformly distributed in [1,6]
). Throw it 200 times (for example) and create the following table:
number: 1 2 3 4 5 6
empirical occurances: 37 41 30 27 32 33
expected occurances: 33.3 33.3 33.3 33.3 33.3 33.3
Now, according to Pearson's test, the statistic is:
X = ((37-33.3)^2)/33.3 + ((41-33.3)^2)/33.3 + ... + ((33-33.3)^2)/33.3
X = (18.49 + 59.29 + 10.89 + 39.69 + 1.69 + 0.09) / 33.3
X = 3.9
For a random C~ChiSquare(5)
, the probability of being higher then 3.9
is ~0.45
(which is not improbable)1.
So we cannot reject the null hypothesis, and we can conclude that the data is probably uniformly distributed in [1,6]
(1) We usually reject the null hypothesis if the value is smaller then 0.05, but this is very case dependent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With