Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is PHP mt_rand really random or possibly biased?

I did two basic A-B-C tests on my website with something like

if(mt_rand(0,2) == 0){
//THROW IN RE HERE 
}elseif(mt_rand(0,2) == 1){
//THROW IN LR HERE
}else{
//THROW IN LB HERE
}

I was expecting the three conditions to occur equally often (33.3% of all pageviews). However, the impressions (as measured by Google Adsense) show very different distributions. Interestingly, both tests (two charts below) show similar patterns: LB occurs most, then RE and then LR.

The sample sizes are many thousands so the chance that this occurs by random chance is really zero.

Am I misunderstanding mr_rand()? Does anybody know if it's been properly tested? How else could these weird patterns show up?

enter image description here

like image 347
RubenGeert Avatar asked Mar 12 '23 07:03

RubenGeert


2 Answers

You're running mt_rand test twice.. you have option 0, 1 and 2. if the test is 0, you throw RE. if not, (ie it's 1 or 2), you run the same test again, (again with options 0, 1 and 2). There you test for 1 and if it is, you throw LR. if not (it's 0 or 2) you throw LB. I can explain it further if you need..

    $number = mt_rand(0,2);
    switch ($number){
     case 0:
       //do re
       break;
     case 1:
       //do lr
       break;
     case 2:
       //do lb
       break;
    }

Or this might do the job as well

if(mt_rand(0,2) == 0){
//THROW IN RE HERE 
}elseif(mt_rand(0,1) == 1){ //we've stripped RE out, no longer deciding from 3 options
//THROW IN LR HERE
}else{
//THROW IN LB HERE
}
like image 77
Honza Avatar answered Mar 17 '23 20:03

Honza


I'm not sure how you're collecting the data through Google Adsense, exactly. Are you relying on Google Analytics by passing in some custom var? If so there definitely could be other factors causing the biased that have nothing to do with PHP.

To test uniform random distribution we can run a test like this in PHP.

$test = [0,0,0];
for($i = 0; $i < 100000; $i++) {
    $rand = mt_rand(0,2);
    $test[$rand]++;
}
var_dump($test);

Which should give you results like this...

array(3) {
  [0]=>
  int(33288)
  [1]=>
  int(33394)
  [2]=>
  int(33318)
}

This indicates the 33% uniform distribution you're looking for over 100K iterations.

It's important to note that the implementation of mt_rand() is a PRNG (Pseudo Random Number Generator) and not a CSPRNG (Cryptographically Secure Pseudo Random Number Generator). Meaning, it's not well suited for cryptographic purposes, but works fine for other PRNG needs. It's based on Mersenne Twister because it's faster than libc rand(). Though I don't think any issues you're finding here in your data is likely to be a direct result of PHP's implementation of mt_rand().

like image 45
Sherif Avatar answered Mar 17 '23 20:03

Sherif