Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Verify Knuth shuffle algorithm is as unbiased as possible

I'm implementing a Knuth shuffle for a C++ project I'm working on. I'm trying to get the most unbiased results from my shuffle (and I'm not an expert on (pseudo)random number generation). I just want to make sure this is the most unbiased shuffle implementation.

draw_t is a byte type (typedef'd to unsigned char). items is the count of items in the list. I've included the code for random::get( draw_t max ) below.

for( draw_t pull_index = (items - 1); pull_index > 1; pull_index-- )
{
    draw_t push_index = random::get( pull_index );

    draw_t push_item = this->_list[push_index];
    draw_t pull_item = this->_list[pull_index];

    this->_list[push_index] = pull_item;
    this->_list[pull_index] = push_item;
}

The random function I'm using has been modified to eliminate modulo bias. RAND_MAX is assigned to random::_internal_max.

draw_t random::get( draw_t max )
{
    if( random::_is_seeded == false )
    {
        random::seed( );
    }

    int rand_value = random::_internal_max;
    int max_rand_value = random::_internal_max - ( max - ( random::_internal_max % max ) );

    do
    {
        rand_value = ::rand( );
    } while( rand_value >= max_rand_value );

    return static_cast< draw_t >( rand_value % max );
}
like image 987
Adam Maras Avatar asked Nov 29 '22 19:11

Adam Maras


2 Answers

Well, one thing you could do as a black-box test is take some relatively small array size, perform a large number of shuffles on it, count how many times you observe each permutation, and then perform Pearson's Chi-square test to determine whether the results are uniformly distributed over the permutation space.

On the other hand, the Knuth shuffle, AKA the Fisher-Yates shuffle, is proven to be unbiased as long as the random number generator that the indices are coming from is unbiased.

like image 182
dsimcha Avatar answered Dec 10 '22 02:12

dsimcha


If I see that right, your random::get (max) doesn't include max.

This line:

draw_t push_index = random::get( pull_index );

then produces a "classical" off-by-one error, as your pull_index and push_index erroneously can never be the same. This produces a subtle bias that you can never have an item where it was before the shuffle. In an extreme example, two-item lists under this "shuffle" would always be reversed.

like image 41
Svante Avatar answered Dec 10 '22 02:12

Svante