Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the order of Python sets not deterministic even when PYTHONHASHSEED=0?

I am developing an agent based model in which I use different type of agents classes whose instances are assigned to different types of objects such as schools, companies, homes, etc. The problem I have is that I cannot enforce reproducibility of runs when debugging, which makes the task very hard because of the model complexity. After a long investigation, I realised that the problem is linked to the order of sets ( built-in random and numpy random seeds are of course applied). Even when I set PYHTONHASHSEED=0, I observe that the order of sets is random at each run. This makes each run of my model different when agents move.

Of course I know that sets are not meant to have an order. I want to use them to make the model as light and fast an possible when removing agents from objects. I want them to behave randomly, except when I need to debug a specific run that raises an exception.

I add the following code so that my claims can be verified. I always set PYTHONHASHSEED from command line via export before launching the code. I print the PYTHONHASHSEED value from code to check that the value has indeed been updated

import os
import random
import numpy as np

print('PYTHON HASH SEED IS', os.environ['PYTHONHASHSEED'])

random.seed(1)
np.random.seed(2)

class S:
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return "".join([type(self).__name__, "_{0.a!r}_",
                        "School", "_{0.b!r}" ]).format(self)

list1 = np.random.randint(1, 100,size=40)
list2 = np.random.randint(1, 10,size=40)
d1 = dict()
s1 = set()
d1['students'] = s1
# assign students to d1
for s_id, sch_id in zip(list1, list2):
    d1['students'].add(S(s_id, sch_id))

print(s1)

The strange thing is that if I use integers as set members instead of class instances, I cannot detect the randomness. Does the problem have to do with the fact that the set members are class instances ? Why ?

Of course I could remodel the way agents are assigned to model objects and replace sets with lists, but if possible I would like to understand the problem. The version I use is python 3.5.4

like image 320
Paolo Gervasoni Vila Avatar asked Aug 21 '18 12:08

Paolo Gervasoni Vila


People also ask

Does Python set guarantee order?

The answer is simply a NO.

What determines the order of a set Python?

AFAIK Python sets are implemented using a hash table. The order in which the items appear depends on the hash function used. Within the same run of the program, the hash function probably does not change, hence you get the same order.

Are Python sets ordered or unordered?

Sets are unordered. Set elements are unique. Duplicate elements are not allowed. A set itself may be modified, but the elements contained in the set must be of an immutable type.

Why set is unordered in Python?

A Set is an unordered collection data type that is iterable, mutable and has no duplicate elements. The major advantage of using a set, as opposed to a list, is that it has a highly optimized method for checking whether a specific element is contained in the set.


1 Answers

The objects you're storing (of type S) are from a class for which no override of __eq__ and __hash__ has been provided, so they use the default implementation, which is object identity based:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

Object identity is (as an implementation detail of CPython) equivalent to the memory address at which the object was allocated (the raw pointer value), and the allocator is going to return different addresses on each run, so ordering will differ each time. ints don't have this problem because they have non-identity based equality and hashing; they hash based on value, not identity, so the precise memory address is irrelevant.

To get consistent ordering for your custom class with a fixed seed, you'd need to define the special equality and hashing methods, e.g.:

def __hash__(self):
    return hash((self.a, self.b))

def __eq__(self, other):
    if not isinstance(other, S):
        return NotImplemented
    return self.a == other.a and self.b == other.b
like image 76
ShadowRanger Avatar answered Oct 18 '22 09:10

ShadowRanger