I'm seeing non-deterministic behavior when trying to select a pseudo-random element from sets, even though the RNG is seeded (example code shown below). Why is this happening, and should I expect other Python data types to show similar behavior?
Notes: I've only tested this on Python 2.7, but it's been reproducible on two different Windows computers.
Similar Issue: The issue at Python random seed not working with Genetic Programming example code may be similar. Based on my testing, my hypothesis is that run-to-run memory allocation differences within the sets is leading to different elements getting picked up for the same RNG state.
So far I haven't found any mention of this kind of caveat/issue in the Python docs for set or random.
Example Code (randTest produces different output run-to-run):
import random
''' Class contains a large set of pseudo-random numbers. '''
class bigSet:
def __init__(self):
self.a = set()
for n in range(2000):
self.a.add(random.random())
return
''' Main test function. '''
def randTest():
''' Seed the PRNG. '''
random.seed(0)
''' Create sets of bigSet elements, presumably many memory allocations. '''
b = set()
for n in range (2000):
b.add(bigSet())
''' Pick a random value from a random bigSet. Would have expected this to be deterministic. '''
c = random.sample(b,1)[0]
print('randVal: ' + str(random.random())) #This value is always the same
print('setSample: ' + str(random.sample(c.a,1)[0])) #This value can change run-to-run
return
The seed() method is used to initialize the random number generator. The random number generator needs a number to start with (a seed value), to be able to generate a random number. By default the random number generator uses the current system time.
The numpy random seed is a numerical value that generates a new set or repeats pseudo-random numbers. The value in the numpy random seed saves the state of randomness. If we call the seed function using value 1 multiple times, the computer displays the same random numbers.
In case you're curious about the generation itself, Python's random module uses the Mersenne Twister random number generator, which is a completely deterministic algorithm.
Seed function is used to save the state of a random function, so that it can generate same random numbers on multiple executions of the code on the same machine or on different machines (for a specific seed value). The seed value is the previous value number generated by the generator.
OrderedSet
is the ideal choice.Neither set
nor frozenset
should be used here, since nowhere is it specified that any of them are ordered. The fact that another answer works is just an accident of implementation. Sets are unordered, and relying on their order results in coupling to the Python version (and possibly machine).
I get a different order from Roland's answer in Python 3.8.6 (although the order between two runs happens to be the same). This is in spite of the fact that the random numbers generated are the same.
To preserve the order, and therefore determinism based on a random
seed, you must use an ordered data structure such as OrderedSet
.
If you do not have OrderedSet
available, or if profiling your code shows OrderedSet
is slow, you can use an OrderedDict
and ignore its values.
If you have Python >= 3.6, then even regular dict
s are ordered thanks to performance optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With