Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seeded Python RNG showing non-deterministic behavior with sets

I'm seeing non-deterministic behavior when trying to select a pseudo-random element from sets, even though the RNG is seeded (example code shown below). Why is this happening, and should I expect other Python data types to show similar behavior?

Notes: I've only tested this on Python 2.7, but it's been reproducible on two different Windows computers.

Similar Issue: The issue at Python random seed not working with Genetic Programming example code may be similar. Based on my testing, my hypothesis is that run-to-run memory allocation differences within the sets is leading to different elements getting picked up for the same RNG state.

So far I haven't found any mention of this kind of caveat/issue in the Python docs for set or random.

Example Code (randTest produces different output run-to-run):

import random

''' Class contains a large set of pseudo-random numbers. '''
class bigSet:
    def __init__(self):
        self.a = set()
        for n in range(2000):
            self.a.add(random.random())
        return


''' Main test function. '''
def randTest():
    ''' Seed the PRNG. '''
    random.seed(0)

    ''' Create sets of bigSet elements, presumably many memory allocations. ''' 
    b = set()
    for n in range (2000):
        b.add(bigSet())

    ''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''    
    c = random.sample(b,1)[0]
    print('randVal: ' + str(random.random()))           #This value is always the same
    print('setSample: ' + str(random.sample(c.a,1)[0])) #This value can change run-to-run
    return
like image 402
Amac26629 Avatar asked Mar 30 '16 19:03

Amac26629


People also ask

What does seed () do in Python?

The seed() method is used to initialize the random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time.

What does NP random seed 42 mean?

The numpy random seed is a numerical value that generates a new set or repeats pseudo-random numbers. The value in the numpy random seed saves the state of randomness. If we call the seed function using value 1 multiple times, the computer displays the same random numbers.

Is Python random deterministic?

In case you're curious about the generation itself, Python's random module uses the Mersenne Twister random number generator, which is a completely deterministic algorithm.

Why are seeds random?

Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator.


1 Answers

OrderedSet is the ideal choice.

Neither set nor frozenset should be used here, since nowhere is it specified that any of them are ordered. The fact that another answer works is just an accident of implementation. Sets are unordered, and relying on their order results in coupling to the Python version (and possibly machine).

I get a different order from Roland's answer in Python 3.8.6 (although the order between two runs happens to be the same). This is in spite of the fact that the random numbers generated are the same.

To preserve the order, and therefore determinism based on a random seed, you must use an ordered data structure such as OrderedSet.

If you do not have OrderedSet available, or if profiling your code shows OrderedSet is slow, you can use an OrderedDict and ignore its values.

If you have Python >= 3.6, then even regular dicts are ordered thanks to performance optimizations.

like image 160
danuker Avatar answered Oct 01 '22 07:10

danuker