I am developing an agent based model in which I use different type of agents classes whose instances are assigned to different types of objects such as schools, companies, homes, etc. The problem I have is that I cannot enforce reproducibility of runs when debugging, which makes the task very hard because of the model complexity. After a long investigation, I realised that the problem is linked to the order of sets ( built-in random and numpy random seeds are of course applied). Even when I set PYHTONHASHSEED=0, I observe that the order of sets is random at each run. This makes each run of my model different when agents move.
Of course I know that sets are not meant to have an order. I want to use them to make the model as light and fast an possible when removing agents from objects. I want them to behave randomly, except when I need to debug a specific run that raises an exception.
I add the following code so that my claims can be verified. I always set PYTHONHASHSEED from command line via export before launching the code. I print the PYTHONHASHSEED value from code to check that the value has indeed been updated
import os
import random
import numpy as np
print('PYTHON HASH SEED IS', os.environ['PYTHONHASHSEED'])
random.seed(1)
np.random.seed(2)
class S:
def __init__(self, a, b):
self.a = a
self.b = b
def __repr__(self):
return "".join([type(self).__name__, "_{0.a!r}_",
"School", "_{0.b!r}" ]).format(self)
list1 = np.random.randint(1, 100,size=40)
list2 = np.random.randint(1, 10,size=40)
d1 = dict()
s1 = set()
d1['students'] = s1
# assign students to d1
for s_id, sch_id in zip(list1, list2):
d1['students'].add(S(s_id, sch_id))
print(s1)
The strange thing is that if I use integers as set members instead of class instances, I cannot detect the randomness. Does the problem have to do with the fact that the set members are class instances ? Why ?
Of course I could remodel the way agents are assigned to model objects and replace sets with lists, but if possible I would like to understand the problem. The version I use is python 3.5.4
The answer is simply a NO.
AFAIK Python sets are implemented using a hash table. The order in which the items appear depends on the hash function used. Within the same run of the program, the hash function probably does not change, hence you get the same order.
Sets are unordered. Set elements are unique. Duplicate elements are not allowed. A set itself may be modified, but the elements contained in the set must be of an immutable type.
A Set is an unordered collection data type that is iterable, mutable and has no duplicate elements. The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set.
The objects you're storing (of type S
) are from a class for which no override of __eq__
and __hash__
has been provided, so they use the default implementation, which is object identity based:
User-defined classes have
__eq__()
and__hash__()
methods by default; with them, all objects compare unequal (except with themselves) andx.__hash__()
returns an appropriate value such thatx == y
implies both thatx is y
andhash(x) == hash(y)
.
Object identity is (as an implementation detail of CPython) equivalent to the memory address at which the object was allocated (the raw pointer value), and the allocator is going to return different addresses on each run, so ordering will differ each time. int
s don't have this problem because they have non-identity based equality and hashing; they hash based on value, not identity, so the precise memory address is irrelevant.
To get consistent ordering for your custom class with a fixed seed, you'd need to define the special equality and hashing methods, e.g.:
def __hash__(self):
return hash((self.a, self.b))
def __eq__(self, other):
if not isinstance(other, S):
return NotImplemented
return self.a == other.a and self.b == other.b
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With