Containers that take hashable objects (such as <code>dict</code> keys or <code>set</code> items). As such, a dictionary can only have one key with the value <code>1</code>, <code>1.0</code> or <code>True</code> etc. (note: simplified somewhat - hash collisions are permitted, but these values are considered equal) My question is: is the parsing order well-defined and is the resulting object predictable across implementations? For example, OSX Python 2.7.11 and 3.5.1 interprets <code>dict</code> like so: <pre class="prettyprint"><code>>>> { True: 'a', 1: 'b', 1.0: 'c', (1+0j): 'd' } {True: 'd'} </code></pre> In this case, it appears that the first key and the last value are preserved. Similar, in the case of <code>set</code>: <pre class="prettyprint"><code>>>> { True, 1, 1.0, (1+0j) } set([(1+0j)]) </code></pre> Here it appears that the last item is preserved. But (as mentioned in comments): <pre class="prettyprint"><code>>>> set([True, 1, 1.0]) set([True]) </code></pre> Now the first in the iterable is preserved. The documentation notes that the order of items (for example in <code>dict.items</code>) is undefined, however my question refers to the result of constructing <code>dict</code> or <code>set</code> objects.

<ul> <li>The bug is now fixed in recent versions of python as explained in @jsf's answer </li> </ul> dictionary-displays <blockquote> If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given. A dict comprehension, in contrast to list and set comprehensions, needs two expressions separated with a colon followed by the usual “for” and “if” clauses. When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced. </blockquote> set displays <blockquote> A set display yields a new mutable set object, the contents being specified by either a sequence of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension. </blockquote> There is a difference in calling the set constructor or using a comprehension and the plain literal. <pre class="prettyprint"><code>def f1(): return {x for x in [True, 1]} def f2(): return set([True, 1]) def f3(): return {True, 1} print(f1()) print(f2()) print(f3()) import dis print("f1") dis.dis(f1) print("f2") dis.dis(f2) print("f3") dis.dis(f3) </code></pre> Output: <pre class="prettyprint"><code>{True} {True} {1} </code></pre> How they are created influences the outcome: <pre class="prettyprint"><code> 605 0 LOAD_CONST 1 (<code object <setcomp> at 0x7fd17dc9a270, file "/home/padraic/Dropbox/python/test.py", line 605>) 3 LOAD_CONST 2 ('f1.<locals>.<setcomp>') 6 MAKE_FUNCTION 0 9 LOAD_CONST 3 (True) 12 LOAD_CONST 4 (1) 15 BUILD_LIST 2 18 GET_ITER 19 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 22 RETURN_VALUE f2 608 0 LOAD_GLOBAL 0 (set) 3 LOAD_CONST 1 (True) 6 LOAD_CONST 2 (1) 9 BUILD_LIST 2 12 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 15 RETURN_VALUE f3 611 0 LOAD_CONST 1 (True) 3 LOAD_CONST 2 (1) 6 BUILD_SET 2 9 RETURN_VALUE </code></pre> Python only runs the <code>BUILD_SET</code> bytecode when you pass a pure literal separated by commas as per: When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. The line for the comprehension: When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension. So thanks to Hamish filing a bug report it does indeed come down to the <code>BUILD_SET</code> opcode as per Raymond Hettinger's comment in the link The culprit is the BUILD_SET opcode in Python/ceval.c which unnecessarily loops backwards, the implementation of which is below: <pre class="prettyprint"><code> TARGET(BUILD_SET) { PyObject *set = PySet_New(NULL); int err = 0; if (set == NULL) goto error; while (--oparg >= 0) { PyObject *item = POP(); if (err == 0) err = PySet_Add(set, item); Py_DECREF(item); } if (err != 0) { Py_DECREF(set); goto error; } PUSH(set); DISPATCH(); } </code></pre>

Dict/Set Parsing Order Consistency

Tags:

python

dictionary

python-internals

set

Containers that take hashable objects (such as dict keys or set items). As such, a dictionary can only have one key with the value 1, 1.0 or True etc. (note: simplified somewhat - hash collisions are permitted, but these values are considered equal)

My question is: is the parsing order well-defined and is the resulting object predictable across implementations? For example, OSX Python 2.7.11 and 3.5.1 interprets dict like so:

>>> { True: 'a', 1: 'b', 1.0: 'c', (1+0j): 'd' }
{True: 'd'}

In this case, it appears that the first key and the last value are preserved.

Similar, in the case of set:

>>> { True, 1, 1.0, (1+0j) }
set([(1+0j)])

Here it appears that the last item is preserved.

But (as mentioned in comments):

>>> set([True, 1, 1.0])
set([True])

Now the first in the iterable is preserved.

The documentation notes that the order of items (for example in dict.items) is undefined, however my question refers to the result of constructing dict or set objects.

452

asked Jan 06 '16 00:01

Hamish

1 Answers

The bug is now fixed in recent versions of python as explained in @jsf's answer

dictionary-displays

If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.

A dict comprehension, in contrast to list and set comprehensions, needs two expressions separated with a colon followed by the usual “for” and “if” clauses. When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced.

set displays

A set display yields a new mutable set object, the contents being specified by either a sequence of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension.

There is a difference in calling the set constructor or using a comprehension and the plain literal.

def f1():
    return {x for x in [True, 1]}

def f2():
    return set([True, 1])
def f3():
    return {True, 1}
print(f1())
print(f2())
print(f3())
import dis

print("f1")
dis.dis(f1)

print("f2")

dis.dis(f2)

print("f3")
dis.dis(f3)

Output:

{True}
{True}
{1}

How they are created influences the outcome:

    605           0 LOAD_CONST               1 (<code object <setcomp> at 0x7fd17dc9a270, file "/home/padraic/Dropbox/python/test.py", line 605>)
              3 LOAD_CONST               2 ('f1.<locals>.<setcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_CONST               3 (True)
             12 LOAD_CONST               4 (1)
             15 BUILD_LIST               2
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 RETURN_VALUE
f2
608           0 LOAD_GLOBAL              0 (set)
              3 LOAD_CONST               1 (True)
              6 LOAD_CONST               2 (1)
              9 BUILD_LIST               2
             12 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             15 RETURN_VALUE
f3
611           0 LOAD_CONST               1 (True)
              3 LOAD_CONST               2 (1)
              6 BUILD_SET                2
              9 RETURN_VALUE

Python only runs the BUILD_SET bytecode when you pass a pure literal separated by commas as per:

When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object.

The line for the comprehension:

When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension.

So thanks to Hamish filing a bug report it does indeed come down to the BUILD_SET opcode as per Raymond Hettinger's comment in the link The culprit is the BUILD_SET opcode in Python/ceval.c which unnecessarily loops backwards, the implementation of which is below:

 TARGET(BUILD_SET) {
            PyObject *set = PySet_New(NULL);
            int err = 0;
            if (set == NULL)
                goto error;
            while (--oparg >= 0) {
                PyObject *item = POP();
                if (err == 0)
                    err = PySet_Add(set, item);
                Py_DECREF(item);
            }
            if (err != 0) {
                Py_DECREF(set);
                goto error;
            }
            PUSH(set);
            DISPATCH();
        }

111

answered Nov 11 '22 23:11

Padraic Cunningham

Related questions
                            
                                PyQt vs PySide comparison [closed]
                            
                                How to delete a record from table?
                            
                                What are some good ways of estimating 'approximate' semantic similarity between sentences?
                            
                                Define remote interpreter on remote Linux machine using Pydev and RSE Server
                            
                                Jinja2: How to use named blocks inside included templates, inside extendable template
                            
                                How to perform a chi-squared goodness of fit test using scientific libraries in Python?
                            
                                Compute the gradient of the SVM loss function
                            
                                Sampling n= 2000 from a Dask Dataframe of len 18000 generates error Cannot take a larger sample than population when 'replace=False'
                            
                                Interactive matplotlib using ipywidgets
                            
                                Where are the gains using numba coming from for pure numpy code?
                            
                                Cache Julia module for faster startup and usage in Python
                            
                                Alter namespace prefixing with ElementTree in Python
                            
                                Which Python client library should I use for CouchdB? [closed]
                            
                                Hot-swapping of Python running program
                            
                                returning aggregated dataframe from pandas groupby
                            
                                Index the middle of a numpy array?
                            
                                What does python's "re.compile" do?
                            
                                Wrapping an std::vector using boost::python vector_indexing_suite
                            
                                open file for random write without truncating?
                            
                                How to best share static data between ipyparallel client and remote engines?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With