Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most probable bits in random integer

Tags:

c

c#

random

I've made such experiment - made 10 million random numbers from C and C#. And then counted how much times each bit from 15 bits in random integer is set. (I chose 15 bits because C supports random integer only up to 0x7fff).

What i've got is this: enter image description here
I have two questions:

  1. Why there are 3 most probable bits ? In C case bits 8,10,12 are most probable. And in C# bits 6,8,11 are most probable.

  2. Also seems that C# most probable bits is mostly shifted by 2 positions then compared to C most probable bits. Why is this ? Because C# uses other RAND_MAX constant or what ?


My test code for C:
void accumulateResults(int random, int bitSet[15]) {
    int i;
    int isBitSet;
    for (i=0; i < 15; i++) {
        isBitSet = ((random & (1<<i)) != 0);
        bitSet[i] += isBitSet;
    }
}

int main() {
    int i;
    int bitSet[15] = {0};
    int times = 10000000;
    srand(0);

    for (i=0; i < times; i++) {
        accumulateResults(rand(), bitSet);
    }

    for (i=0; i < 15; i++) {
        printf("%d : %d\n", i , bitSet[i]);
    }

    system("pause");
    return 0;
}

And test code for C#:

static void accumulateResults(int random, int[] bitSet)
{
    int i;
    int isBitSet;
    for (i = 0; i < 15; i++)
    {
        isBitSet = ((random & (1 << i)) != 0) ? 1 : 0;
        bitSet[i] += isBitSet;
    }
}

static void Main(string[] args)
{
    int i;
    int[] bitSet = new int[15];
    int times = 10000000;
    Random r = new Random();

    for (i = 0; i < times; i++)
    {
        accumulateResults(r.Next(), bitSet);
    }

    for (i = 0; i < 15; i++)
    {
        Console.WriteLine("{0} : {1}", i, bitSet[i]);
    }

    Console.ReadKey();
}

Very thanks !! Btw, OS is Windows 7, 64-bit architecture & Visual Studio 2010.

EDIT
Very thanks to @David Heffernan. I made several mistakes here:

  1. Seed in C and C# programs was different (C was using zero and C# - current time).
  2. I didn't tried experiment with different values of Times variable to research reproducibility of results.

Here's what i've got when analyzed how probability that first bit is set depends on number of times random() was called: enter image description here
So as many noticed - results are not reproducible and shouldn't be taken seriously. (Except as some form of confirmation that C/C# PRNG are good enough :-) ).

like image 711
Agnius Vasiliauskas Avatar asked May 23 '12 15:05

Agnius Vasiliauskas


3 Answers

This is just common or garden sampling variation.

Imagine an experiment where you toss a coin ten times, repeatedly. You would not expect to get five heads every single time. That's down to sampling variation.

In just the same way, your experiment will be subject to sampling variation. Each bit follows the same statistical distribution. But sampling variation means that you would not expect an exact 50/50 split between 0 and 1.

Now, your plot is misleading you into thinking the variation is somehow significant or carries meaning. You'd get a much better understanding of this if you plotted the Y axis of the graph starting at 0. That graph looks like this:

enter image description here

If the RNG behaves as it should, then each bit will follow the binomial distribution with probability 0.5. This distribution has variance np(1 − p). For your experiment this gives a variance of 2.5 million. Take the square root to get the standard deviation of around 1,500. So you can see simply from inspecting your results, that the variation you see is not obviously out of the ordinary. You have 15 samples and none are more than 1.6 standard deviations from the true mean. That's nothing to worry about.

You have attempted to discern trends in the results. You have said that there are "3 most probable bits". That's only your particular interpretation of this sample. Try running your programs again with different seeds for your RNGs and you will have graphs that look a little different. They will still have the same quality to them. Some bits are set more than others. But there won't be any discernible patterns, and when you plot them on a graph that includes 0, you will see horizontal lines.

For example, here's what your C program outputs for a random seed of 98723498734.

enter image description here

I think this should be enough to persuade you to run some more trials. When you do so you will see that there are no special bits that are given favoured treatment.

like image 115
David Heffernan Avatar answered Oct 12 '22 01:10

David Heffernan


You know that the deviation is about 2500/5,000,000, which comes down to 0,05%?

like image 31
CodeCaster Avatar answered Oct 12 '22 01:10

CodeCaster


Note that the difference of frequency of each bit varies by only about 0.08% (-0.03% to +0.05%). I don't think I would consider that significant. If every bit were exactly equally probable, I would find the PRNG very questionable instead of just somewhat questionable. You should expect some level of variance in processes that are supposed to be more or less modelling randomness...

like image 32
twalberg Avatar answered Oct 12 '22 00:10

twalberg