Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP - Improve random number sequence generation

I guess here comes my first question on SO.

I'm currently working on a website and I have to generate 6 numbers between 1 and 29 (one of each max) for a lottery. As they can be in any order, I simply sort them afterwards.

If I'm not mistaken, it should mean that there's (29*28*27*26*25*24) / 6! = 475020 different possible combinations.

I've tried different ways of genereting the sequences, either using mt_rand or random_int (from random_compat) but when I test it with something like 10k iterations, I always get around 100 duplicates, even though they are like 465k combinations still available.

Here are the code samples I've been trying :

// Using an array and mt_rand (or random_int, giving same results)
// Also tried shuffling the array instead of simply reindexing it, not better

$values = range(1, 29);

while(count($values) > 6) {
    unset($values[mt_rand(0, count($values) - 1)]);
    $values = array_values($values);
}



// Creating the array from random numbers (same results using random_int)

$values = array();
while (count($values) < 6) {
    $r = mt_rand(1, 29);
    if (in_array($r, $values)) {
        continue;
    } else {
        $values[] = $r;
    }
}

So well... My questions are :

  • Is there a way to improve what I'm currently doing ?
  • Is that the way it's gonna be and I have to deal with it ?
  • Am I simply doing it wrong ?

Thanks !

Lyn.

PS : Looked through many questions, but didn't find anything filling my needs, sorry if I didn't look well enough !

Just to make a few things clear: Using random_int (which makes use of /dev/urandom or openssl_random_pseudo_bytes) doesn't improve anything, which I thought would. And I don't want to use any external API (like random.org) if possible.

like image 340
Lynesth Avatar asked Nov 10 '22 07:11

Lynesth


1 Answers

Using random_int (which makes use of /dev/urandom or openssl_random_pseudo_bytes) doesn't improve anything, which I thought would.

Sure it does, it's just not something you can identify visually. mt_rand() and rand() only allow about 232 possible seeds and 232 possible outputs and, most importantly, have deterministic sequences: if you know a few outputs, you can predict the rest until it's reseeded.

Your operating system's CSPRNG doesn't have any such limits. Knowing a handful of random_int() outputs (which in PHP are limited to 232 possible values on 32-bit systems, 264 on 64-bit systems) will not give you any information on future outputs.

I'm currently working on a website and I have to generate 6 numbers between 1 and 29 (one of each max) for a lottery. As they can be in any order, I simply sort them afterwards.

Okay, that's a neat idea. You definitely want a CSPRNG here.

when I test it with something like 10k iterations, I always get around 100 duplicates, even though they are like 465k combinations still available.

As others have pointed out, this is the birthday problem/paradox at play.

If you need a solution, try something like this:

function random_unique_select($num, array $possible_values)
{
    $sizeof = count($possible_values);
    if ($num > $sizeof) {
        throw new InvalidArgumentException('$num is too large');
    }
    $selected = [];
    for ($i = 0; $i < $num; ++$i) {
        // Grab a random int [0, ... N - 1]
        $r = random_int(0, $sizeof - 1);
        // Copy the selected value into $selected
        $selected[] = $possible_values[$r];
        // Delete it from the range of possible values
        unset($possible_values[$r]);
        // N has grown smaller by 1
        --$sizeof;
        // Reset keys; we want this to be zero-indexed.
        $possible_values = array_values($possible_values);
    }
    return $selected;
}

$lottery = random_unique_select(6, range(1,29));

Demos:

  • http://3v4l.org/Hb97A (single-pass; usesopenssl_random_pseudo_bytes())
  • http://3v4l.org/E7cb5 (100,000 iterations -> very small numbers of duplicates)
like image 87
Scott Arciszewski Avatar answered Nov 15 '22 10:11

Scott Arciszewski