What is the time complexity of checking membership in dict.items()?
According to the documentation:
Keys views are set-like since their entries are unique and hashable. If all values are hashable, so that (key, value) pairs are unique and hashable, then the items view is also set-like. (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).
So I did some testing with the following code:
from timeit import timeit
def membership(val, container):
val in container
r = range(100000)
s = set(r)
d = dict.fromkeys(r, 1)
d2 = {k: [1] for k in r}
items_list = list(d2.items())
print('set'.ljust(12), end='')
print(timeit(lambda: membership(-1, s), number=1000))
print('dict'.ljust(12), end='')
print(timeit(lambda: membership(-1, d), number=1000))
print('d_keys'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.keys()), number=1000))
print('d_values'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.values()), number=1000))
print('\n*With hashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d.items()), number=1000))
print('*With unhashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d2.items()), number=1000))
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), d2.items()), number=1000))
print('\nitems_list'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), items_list), number=1000))
With the output:
set 0.00034419999999998896
dict 0.0003307000000000171
d_keys 0.0004200000000000037
d_values 2.4773092
*With hashable dict.values
d_items 0.0004413000000003109
*With unhashable dict.values
d_items 0.00042879999999989593
d_items 0.0005549000000000248
items_list 3.5529328
As you can see, when the dict.values
are all hashable (int
),
the execution time for the membership is similar to that of a set
or d_keys
,
because items view is set-like.
The last two examples are on the dict.values
with unhashable objects (list
).
So I assumed the execution time would be similar to that of a list
.
However, they are still similar to that of a set
.
Does this mean that even though dict.values
are unhashable objects,
the implementation of items view is still very efficient,
resulting O(1) time complexity for checking the membership?
Am I missing something here?
EDITED
per @chepner's comment: dict.fromkeys(r, [1])
-> {k: [1] for k in r}
EDITED
per @MarkRansom's comment: another test case list(d2.items())
Complexity for a membership test is identical to that of a list: O(n).
If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1) while lists have linear time complexity, O(n).
Python Dictionary items() Method The items() method returns a view object. The view object contains the key-value pairs of the dictionary, as tuples in a list. The view object will reflect any changes done to the dictionary, see example below.
The in operator for dict has average case time-complexity of O(1).
The (amortized) time complexity is constant (O(1)) in the size of the dictionary. Python dictionaries are implemented as hash tables in python.
Dictionaries are hash tables. As you probably know, look ups with hash tables have amortized time complexity of O (1), but worst case complexity of O (n), where n is the number of elements in the hash table.
The average time complexity is of course O (1). The best method would be to check and take a look at the hashs of the objects you are using. The CPython Dict uses int PyObject_Hash (PyObject *o) which is the equivalent of hash (o).
Python's dictionary implementation reduces the average complexity of dictionary lookups to O (1) by requiring that key objects provide a "hash" function. Such a hash function takes the information in a key object and uses it to produce an integer, called a hash value.
The time complexity of membership testing in item views is O(1)
.
This is how the membership testing works:
def dictitems_contains(dictview, key_value_pair):
d = dictview.mapping
k, v = key_value_pair
try:
return d[k] == v
except KeyError:
return False
Here's the C source code:
static int
dictitems_contains(_PyDictViewObject *dv, PyObject *obj)
{
int result;
PyObject *key, *value, *found;
if (dv->dv_dict == NULL)
return 0;
if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
return 0;
key = PyTuple_GET_ITEM(obj, 0);
value = PyTuple_GET_ITEM(obj, 1);
found = PyDict_GetItemWithError((PyObject *)dv->dv_dict, key);
if (found == NULL) {
if (PyErr_Occurred())
return -1;
return 0;
}
Py_INCREF(found);
result = PyObject_RichCompareBool(found, value, Py_EQ);
Py_DECREF(found);
return result;
}
We get the same constant lookup time regardless of the dictionary size (in these cases: 100, 1,000, and 10,000).
$ python3.8 -m timeit -s 'd = dict.fromkeys(range(100))' '(99, None) in d.items()'
5000000 loops, best of 5: 92 nsec per loop
$ python3.8 -m timeit -s 'd = dict.fromkeys(range(1_000))' '(99, None) in d.items()'
5000000 loops, best of 5: 92.2 nsec per loop
$ python3.8 -m timeit -s 'd = dict.fromkeys(range(10_000))' '(99, None) in d.items()'
5000000 loops, best of 5: 92.1 nsec per loop
We can monitor hash calls by patching _hash_():
class Int(int):
def __hash__(self):
print('Hash called')
return hash(int(self))
Applying the monitoring tool show that hashing occurs when the dictionary is created and again when doing membership testing on the items view:
>>> d = {Int(1): 'one'}
Hash called
>>> (Int(1), 'one') in d.items()
Hash called
True
Lookup in an instance of dict_items
is an O(1) operation (though one with an arbitrarily large constant, related to the complexity of comparing values.)
dictitems_contains
doesn't simply try to hash the tuple and look it up in a set-like collection of key/value pairs.
(Note: all of the following links are just to different lines of dictitems_contain
, if you don't want to click on them individually.)
To evaluate
(-1, [1]) in d2.items()
it first extracts the key from the tuple, then tries to find that key in the underlying dict
. If that lookup fails, it immediately returns false. Only if the key is found does it then compare the value from the tuple to the value mapped to the key in the dict.
At no point does dictitems_contains
need to hash the second element of the tuple.
It's not clear in what ways an instance of dict_items
is not set-like when the values are non-hashable, as mentioned in the documentation.
A simplified, pure-Python implementation of dict_items.__contains__
might look something like
class DictItems:
def __init__(self, d):
self.d = d
def __contains__(self, t):
key = t[0]
value = t[1]
try:
dict_value = self.d[key] # O(1) lookup
except KeyError:
return False
return value == dict_value # Arbitrarily expensive comparison
...
where d.items()
returns DictItems(d)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With