Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Similar random number generation in python and c++ but getting different output

I have two functions, in c++ and python, that determine how many times an event with a certain probability will occur over a number of rolls.

Python version:

def get_loot(rolls):
    drops = 0

    for i in range(rolls):
        # getting a random float with 2 decimal places
        roll = random.randint(0, 10000) / 100
        if roll < 0.04:
            drops += 1

    return drops

for i in range(0, 10):
    print(get_loot(1000000))

Python output:

371
396
392
406
384
392
380
411
393
434

c++ version:

int get_drops(int rolls){
    int drops = 0;
    for(int i = 0; i < rolls; i++){
        // getting a random float with 2 decimal places
        float roll = (rand() % 10000)/100.0f;
        if (roll < 0.04){
            drops++;
        }
    }
    return drops;
}

int main()
{
    srand(time(NULL));
    for (int i = 0; i <= 10; i++){
        cout << get_drops(1000000) << "\n";
    }
}

c++ output:

602
626
579
589
567
620
603
608
594
610
626

The cood looks identical (at least to me). Both functions simulate an occurence of an event with a probablilty of 0.04 over 1,000,000 rolls. However the output of the python version is about 30% lower than that of the c++ version. How are these two versions different and why do they have different outputs?

like image 516
DisplayName01 Avatar asked Jul 10 '21 22:07

DisplayName01


People also ask

Can you generate the same random numbers everytime?

random seed() example to generate the same random number every time. If you want to generate the same number every time, you need to pass the same seed value before calling any other random module function. Let's see how to set seed in Python pseudo-random number generator.


Video Answer


2 Answers

In C++ rand() "Returns a pseudo-random integral number in the range between 0 and RAND_MAX."

RAND_MAX is "is library-dependent, but is guaranteed to be at least 32767 on any standard library implementation."

Let's set RAND_MAX at 32,767.

When calculating [0, 32767) % 10000 the random number generation is skewed.

The values 0-2,767 all occur 4 times in the range (% 10000)->

Value Calculation Result
1 1 % 10000 1
10001 10001 % 10000 1
20001 20001 % 10000 1
30001 30001 % 10000 1

Where as the values 2,768-9,999 occur only 3 times in the range (% 10000) ->

Value Calculation Result
2768 2768 % 10000 2768
12768 12768 % 10000 2768
22768 22768 % 10000 2768

This makes the values 0-2767 25% more likely to occur than the values 2768-9,999 (assuming rand() does, in fact, produce an even distribution between 0 and RAND_MAX).


Python on the other hand using randint produces an even distribution between start and end as randint is an "Alias for randrange(a, b+1)"

And randrange (in python 3.2 and newer) will produce evenly distributed values:

Changed in version 3.2: randrange() is more sophisticated about producing equally distributed values. Formerly it used a style like int(random()*n) which could produce slightly uneven distributions.


There are several approaches to generating random numbers in C++. Something perhaps the most similar to python would be to use a Mersenne Twister Engine (which is the same as python if with some differences).

Via uniform_int_distribution with mt19937:

#include <iostream>
#include <random>
#include <chrono>


int get_drops(int rolls) {
    std::mt19937 e{
            static_cast<unsigned int> (
                    std::chrono::steady_clock::now().time_since_epoch().count()
            )
    };
    std::uniform_int_distribution<int> d{0, 9999};
    int drops = 0;
    for (int i = 0; i < rolls; i++) {
        float roll = d(e) / 100.0f;
        if (roll < 0.04) {
            drops++;
        }
    }
    return drops;
}

int main() {
    for (int i = 0; i <= 10; i++) {
        std::cout << get_drops(1000000) << "\n";
    }
}

It is notable that the underlying implementation of the two engines as well as seeding and distribution are all slightly different, however, this will be much closer to python.


Alternatively as Matthias Fripp suggests scaling up rand and dividing by RAND_MAX:

int get_drops(int rolls) {
    int drops = 0;
    for (int i = 0; i < rolls; i++) {
        float roll = (10000 * rand() / RAND_MAX) / 100.0f;
        if (roll < 0.04) {
            drops++;
        }
    }
    return drops;
}

This is also much closer to the python output (again with some differences in the way random numbers are generated in the underlying implementations).

like image 191
Henry Ecker Avatar answered Oct 19 '22 15:10

Henry Ecker


The results are skewed because rand() % 10000 is not the correct way to achieve a uniform distribution. (See also rand() Considered Harmful by Stephan T. Lavavej.) In modern C++, prefer the pseudo-random number generation library provided in header <random>. For example:

#include <iostream>
#include <random>

int get_drops(int rolls)
{
    std::random_device rd;
    std::mt19937 gen{ rd() };
    std::uniform_real_distribution<> dis{ 0.0, 100.0 };
    int drops{ 0 };
    for(int roll{ 0 }; roll < rolls; ++roll)
    {
        if (dis(gen) < 0.04)
        {
            ++drops;
        }
    }

    return drops;
}

int main()
{
    for (int i{ 0 }; i <= 10; ++i)
    {
        std::cout << get_drops(1000000) << '\n';
    }
}
like image 41
heap underrun Avatar answered Oct 19 '22 15:10

heap underrun