Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is dictionary ordering non-deterministic?

I recently switched from Python 2.7 to Python 3.3, and it seems that while in Python 2 the ordering of dictionary keys was arbitrary but consistent, in Python 3 the ordering of the keys of a dictionary obtained with e.g. vars() appears non-deterministic.

If I run:

class Test(object): pass parameters = vars(Test) print(list(parameters.keys())) 

in both Python 2.7 and Python 3.3, then:

  • Python 2.7 consistently gives me

    ['__dict__', '__module__', '__weakref__', '__doc__'] 
  • With Python 3.3, I can get any random order – for example:

    ['__weakref__', '__module__', '__qualname__', '__doc__', '__dict__'] ['__doc__', '__dict__', '__qualname__', '__module__', '__weakref__'] ['__dict__', '__module__', '__qualname__', '__weakref__', '__doc__'] ['__weakref__', '__doc__', '__qualname__', '__dict__', '__module__'] 

Where does this non-determinism come from? And why is something like

list({str(i): i for i in range(10)}.keys()) 

… consistent between runs, always giving

['3', '2', '1', '0', '7', '6', '5', '4', '9', '8'] 

… ?

like image 265
Anaphory Avatar asked Feb 19 '13 11:02

Anaphory


People also ask

Why is Python not deterministic?

Python has never guaranteed iteration order of keys in a dict or set, and applications are advised to never rely on it. Historically, dict iteration order has not changed very often across releases and has always remained consistent between successive executions of Python.

Are dict items deterministic?

Description. list(dict. keys()) is not deterministic and the order can be different in Python 2.7 and Python 3.7. For Python 3.7+ the insertion order is respected and part of the Python spec.

Is dictionary is ordered or unordered?

A dictionary is a collection which is ordered*, changeable and do not allow duplicates. As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered.

Can a dictionary be ordered?

It is not possible to sort a dictionary, only to get a representation of a dictionary that is sorted. Dictionaries are inherently orderless, but other types, such as lists and tuples, are not. So you need an ordered data type to represent sorted values, which will be a list—probably a list of tuples.


2 Answers


Update: In Python 3.6, dict has a new implementation which preserves insertion order. From Python 3.7, this order-preserving behaviour is guaranteed:

the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec.


This is the result of a security fix from 2012, which was enabled by default in Python 3.3 (scroll down to "Security improvements").

From the announcement:

Hash randomization causes the iteration order of dicts and sets to be unpredictable and differ across Python runs. Python has never guaranteed iteration order of keys in a dict or set, and applications are advised to never rely on it. Historically, dict iteration order has not changed very often across releases and has always remained consistent between successive executions of Python. Thus, some existing applications may be relying on dict or set ordering. Because of this and the fact that many Python applications which don't accept untrusted input are not vulnerable to this attack, in all stable Python releases mentioned here, HASH RANDOMIZATION IS DISABLED BY DEFAULT.

As noted above, the last, capitalized bit is no longer true in Python 3.3.

See also: object.__hash__() documentation ("Note" sidebar).

If absolutely necessary, you can disable hash randomization in versions of Python affected by this behaviour by setting the PYTHONHASHSEED environment variable to 0.


Your counterexample:

list({str(i): i for i in range(10)}.keys()) 

… does not in fact always give the same result in Python 3.3, although the number of different orderings is limited due to the way hash collisions are handled:

$ for x in {0..999} > do >   python3.3 -c "print(list({str(i): i for i in range(10)}.keys()))" > done | sort | uniq -c      61 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']      73 ['1', '0', '3', '2', '5', '4', '7', '6', '9', '8']      62 ['2', '3', '0', '1', '6', '7', '4', '5', '8', '9']      59 ['3', '2', '1', '0', '7', '6', '5', '4', '9', '8']      58 ['4', '5', '6', '7', '0', '1', '2', '3', '8', '9']      55 ['5', '4', '7', '6', '1', '0', '3', '2', '9', '8']      62 ['6', '7', '4', '5', '2', '3', '0', '1', '8', '9']      63 ['7', '6', '5', '4', '3', '2', '1', '0', '9', '8']      60 ['8', '9', '0', '1', '2', '3', '4', '5', '6', '7']      66 ['8', '9', '2', '3', '0', '1', '6', '7', '4', '5']      65 ['8', '9', '4', '5', '6', '7', '0', '1', '2', '3']      53 ['8', '9', '6', '7', '4', '5', '2', '3', '0', '1']      62 ['9', '8', '1', '0', '3', '2', '5', '4', '7', '6']      52 ['9', '8', '3', '2', '1', '0', '7', '6', '5', '4']      73 ['9', '8', '5', '4', '7', '6', '1', '0', '3', '2']      76 ['9', '8', '7', '6', '5', '4', '3', '2', '1', '0'] 

As noted at the beginning of this answer, that's no longer the case in Python 3.6:

$ for x in {0..999} > do >   python3.6 -c "print(list({str(i): i for i in range(10)}.keys()))" > done | sort | uniq -c    1000 ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] 
like image 87
Zero Piraeus Avatar answered Sep 19 '22 14:09

Zero Piraeus


Note that Python 3.7 still has non-deterministic sets however. dicts preserve insertion order but sets do not. Sets can exhibit the same random behavior.

python3 -c "print({str(i) for i in range(9)})"

still gives different results from one run to the next.

like image 44
Pete Cacioppi Avatar answered Sep 21 '22 14:09

Pete Cacioppi