Consider an event that occurs with probability p. This program checks how many failed trials it takes before the event occurs and keeps a histogram of the totals. e.g.: If p were 0.5, then this would be like asking how many times in a row a coin comes up tails before it comes up heads? With smaller values of p, we would expect many failures before we get a success.
The implementation being tested is essentially: while (!(rand.NextDouble() < p)) count++;
Here is a plot of the histogram of outcomes for count.
Immediately obvious is the irregularity at x=54. For some reason, it is approximately half as likely as it should be for a series of exactly 54 random numbers greater than or equal to p to be generated in a row.
The actual p I'm checking in this test is 1/32. (Doesn't really matter, as long as it's small enough to get some measurable number of 54's as an outcome.) And I'm counting 10000000 total successes. (Also doesn't seem to matter.) It also doesn't matter what random seed I use.
Obviously this is a quirk of the pseudo-random number generator being used by the Random.NextDouble
function in .NET. But I'd like to know why does this otherwise uniform data have such a striking single spike in such an oddly specific and consistent place? What is it about this particular pseudo-random number generator that makes generating exactly 54 large numbers in a row followed by a small number half as likely as any other sequence length?
I would have expected many more non-uniform anomalies as it degenerates, not just this one spike.
Here is the code that generated this data set:
using System;
namespace RandomTest
{
class Program
{
static void Main(string[] args)
{
Random rand = new Random(1);
int numTrials = 10000000;
int[] hist = new int[512];
double p = 1.0 / 32.0;
for (int i = 0; i < numTrials; ++i) {
int count = 0;
while (!(rand.NextDouble() < p)) {
count++;
}
if (count > hist.Length - 1) {
count = hist.Length - 1;
}
hist[count]++;
}
for (int i = 0; i < hist.Length; ++i) {
Console.WriteLine("{0},{1}", i, hist[i]);
}
}
}
}
In case it's relevant, this is .Net Framework 4.7.2 on Windows x86.
I ran your code on framework 4.8 and I got point 28 to be the outlier:
Then, I ran it again without any changes, and 58 was the outlier:
My Guess as to the cause of the issue you perceive is that the random generator is random.
Running the code yields different results each time, and it appears to be random where the outlier is.
Since, we know that the outlier is random, we can conclude that it is not a fault in a specific line of code. Because of this we can assume that the random outlier could be caused by mere chance that the generator picked a number significantly less than others. The randomness of randomness.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With