When constructing a dictionary as follows:
dict = { True: 'yes', 1: 'No'}
When I run it in the interactive Python interpreter the dict is represented this way:
dict = {True: 'No'}
I understand that the values True
and 1
are equal due to the type coercion because when comparing numeric types, the narrowed type is widened to the other type (boolean is a child of integer). So as I understood from the documentation, when we enter True == 1
Python converts True
to 1
and compares them.
What I don't understand is why the True
is selected as a key instead of 1
.
I am missing something?
Python List cmp() Method. The compare method cmp() is used in Python to compare values and keys of two dictionaries. If method returns 0 if both dictionaries are equal, 1 if dic1 > dict2 and -1 if dict1 < dict2.
The values of a dictionary can be of any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.
Dictionaries are implemented as hash tables and there are two important concepts when adding keys/values here: hashing and equality.
To insert a particular key/value, Python first computes the hash value of the key. This hash value is used to determine the row of the table where Python should first attempt to put the key/value.
If the row of the hash table is empty, great: the new key/value can inserted into the dictionary, filling the empty row.
However, if there's already something in that row, Python needs to test the keys for equality. If the keys are equal (using ==
) then they're deemed to be the same key and Python just needs to update the corresponding value on that row.
(If the keys are not equal Python looks at other rows in the table until it finds the key or reaches an empty row, but that's not relevant for this question.)
When you write {True: 'yes', 1: 'No'}
, you are telling Python to create a new dictionary and then fill it with two key/value pairs. These are processed left to right: True: 'yes'
then 1: 'No'
.
We have hash(True)
equals 1. The key True
goes in at row 1 in the hash table and the string 'yes'
is its value.
For the next pair, Python sees that hash(1)
is also 1 and so looks at row 1 of the table. There's something already there, so now Python checks the keys for equality. We have 1 == True
so 1
is deemed to be the same key as True
and so its corresponding value is changed to the string 'No'
.
This results in a dictionary with one entry: {True: 'No'}
.
If you want to peer at the guts of CPython 3.5 to see what creating a dictionary looks below the surface-Python level, here's more detail.
The Python code {True: 'yes', 1: 'No'}
is parsed into tokens and given to the compiler. Given the syntax, Python knows that a dictionary must be created using the values inside the braces. Byte code to load the four values onto the virtual machine's stack (LOAD_CONST
) and then build the dictionary (BUILD_MAP
) is queued up.
The four constant values are pushed onto the top of the stack in the order that they're seen:
'No'
1
'yes'
True
The opcode BUILD_MAP
is then called with the argument 2
(Python counted two key/value pairs). This opcode is responsible for actually creating dictionary from the items on the stack. It looks like this:
TARGET(BUILD_MAP) {
int i;
PyObject *map = _PyDict_NewPresized((Py_ssize_t)oparg);
if (map == NULL)
goto error;
for (i = oparg; i > 0; i--) {
int err;
PyObject *key = PEEK(2*i);
PyObject *value = PEEK(2*i - 1);
err = PyDict_SetItem(map, key, value);
if (err != 0) {
Py_DECREF(map);
goto error;
}
}
while (oparg--) {
Py_DECREF(POP());
Py_DECREF(POP());
}
PUSH(map);
DISPATCH();
}
The three key steps here are as follows:
An empty hashtable is created using _PyDict_NewPresized
. Small dictionaries (of just a few items, like 2 in this case) need a table with eight rows.
The for
loop is entered, starting at 2 (in this case) and counting down to 0. PEEK(n)
is a macro that points to the nth item down the stack. Therefore on the first iteration of the loop, we'll have
PyObject *key = PEEK(2*2); /* item 4 down the stack */
PyObject *value = PEEK(2*2 - 1); /* item 3 down the stack */
This means that *key
will be True
and *value
will be 'yes'
on the first loop through. On the second it will be 1
and 'No'
.
PyDict_SetItem
is called in each loop to put the current *key
and *value
into the dictionary. This is the same function that is called when you write dictionary[key] = value
. It computes the hash of the key to work out where to look first in the hash table and then, if needed, compare the key to any existing key on that row (as discussed above).Basic premise is - True
and 1
have same hashes and are equal to each other - that's why they cannot be separate keys in hash table (technically inequal object with same hashes may - but hash collisions decreases performance).
>>> True == 1
True
>>> hash(1)
1
>>> hash(True)
1
Now, let's consider a bytecode:
import dis
dis.dis("Dic = { True: 'yes', 1: 'No'}")
This prints:
0 LOAD_CONST 0 (True)
3 LOAD_CONST 1 ('yes')
6 LOAD_CONST 2 (1)
9 LOAD_CONST 3 ('No')
12 BUILD_MAP 2
15 STORE_NAME 0 (Dic)
18 LOAD_CONST 4 (None)
21 RETURN_VALUE
Basically what happens is that dict literal is tokenized to keys and values, and they are pushed to stack. After that BUILD_MAP 2
coverts two pairs of (keys, values) to dictionary.
Most likely order of data on stack (which seems to be determined by order of keys in dict literal) and implementation details of BUILD_MAP
decides on resulting dict keys and values.
It seems like key-value assignments are done in order defined in dict literal.
Assignment behaves same as d[key] = value
operation, so it's basically:
key
is not in dict (by equality): add key
do dictvalue
under key
Given {True: 'yes', 1: 'No'}
:
{}
Add (True, 'yes')
(True, hash(True))
== (True, 1)
is new key in dict1
to 'yes'
Add (1, 'no')
1
is in dict (1 == True
) -> there is no need for new key in dictionary1
(True
) with value 'no'
Result: {True: 'No'}
As I commented, I do not know if this is guarranted by Python specs or if it is just CPython implementation-defined behavior, it may differ in other interpreter implementations.
True
and 1
are different objects, but they both have the same value:
>>> True is 1
False
>>> True == 1
True
This is similar to two strings that may have the same value, but are stored in different memory locations:
>>> x = str(12345)
>>> y = str(12345)
>>> x == y
True
>>> x is y
False
First one item is added to the dictionary; then when the other is added, that value already exists as a key. So the key is updated, key values are unique.
>>> {x: 1, y: 2}
{"12345": 2}
If the key is already present in the dictionary, it does not override the key only the value associated.
I believe that writing x = {True:"a", 1:"b"}
is along the lines of:
x = {}
x[True] = "a"
x[1] = "b"
and by the time it reaches x[1] = "b"
the key True
is already in the dict so why change it? why not just override the associated value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With