Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the time complexity of checking membership in dict.items()?

What is the time complexity of checking membership in dict.items()?

According to the documentation:

Keys views are set-like since their entries are unique and hashable. If all values are hashable, so that (key, value) pairs are unique and hashable, then the items view is also set-like. (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).

So I did some testing with the following code:

from timeit import timeit

def membership(val, container):
    val in container

r = range(100000)
s = set(r)
d = dict.fromkeys(r, 1)
d2 = {k: [1] for k in r}
items_list = list(d2.items())

print('set'.ljust(12), end='')
print(timeit(lambda: membership(-1, s), number=1000))
print('dict'.ljust(12), end='')
print(timeit(lambda: membership(-1, d), number=1000))
print('d_keys'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.keys()), number=1000))
print('d_values'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.values()), number=1000))
print('\n*With hashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d.items()), number=1000))
print('*With unhashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d2.items()), number=1000))
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), d2.items()), number=1000))
print('\nitems_list'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), items_list), number=1000))

With the output:

set         0.00034419999999998896
dict        0.0003307000000000171
d_keys      0.0004200000000000037
d_values    2.4773092

*With hashable dict.values
d_items     0.0004413000000003109
*With unhashable dict.values
d_items     0.00042879999999989593
d_items     0.0005549000000000248

items_list  3.5529328

As you can see, when the dict.values are all hashable (int),
the execution time for the membership is similar to that of a set or d_keys,
because items view is set-like.
The last two examples are on the dict.values with unhashable objects (list).
So I assumed the execution time would be similar to that of a list.
However, they are still similar to that of a set.

Does this mean that even though dict.values are unhashable objects,
the implementation of items view is still very efficient,
resulting O(1) time complexity for checking the membership?

Am I missing something here?

EDITED per @chepner's comment: dict.fromkeys(r, [1]) -> {k: [1] for k in r}
EDITED per @MarkRansom's comment: another test case list(d2.items())

like image 451
ywbaek Avatar asked Jun 25 '20 15:06

ywbaek


People also ask

What is the time complexity of checking membership of an item in a list of size n in Python?

Complexity for a membership test is identical to that of a list: O(n).

What is the time complexity of dict in Python?

If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1) while lists have linear time complexity, O(n).

What does dict items () do in Python?

Python Dictionary items() Method The items() method returns a view object. The view object contains the key-value pairs of the dictionary, as tuples in a list. The view object will reflect any changes done to the dictionary, see example below.

What is the time complexity of checking if a key is in a dictionary in Python?

The in operator for dict has average case time-complexity of O(1).

What is the (amortized) time complexity of Python dictionary?

The (amortized) time complexity is constant (O(1)) in the size of the dictionary. Python dictionaries are implemented as hash tables in python.

What is the worst case complexity of a lookup Dictionary?

Dictionaries are hash tables. As you probably know, look ups with hash tables have amortized time complexity of O (1), but worst case complexity of O (n), where n is the number of elements in the hash table.

What is the average time complexity of Python Dict?

The average time complexity is of course O (1). The best method would be to check and take a look at the hashs of the objects you are using. The CPython Dict uses int PyObject_Hash (PyObject *o) which is the equivalent of hash (o).

How does Python's dictionary implementation reduce the complexity of lookups?

Python's dictionary implementation reduces the average complexity of dictionary lookups to O (1) by requiring that key objects provide a "hash" function. Such a hash function takes the information in a key object and uses it to produce an integer, called a hash value.


Video Answer


2 Answers

Short-answer

The time complexity of membership testing in item views is O(1).

Psuedo-code for lookup

This is how the membership testing works:

def dictitems_contains(dictview, key_value_pair):
    d = dictview.mapping
    k, v = key_value_pair
    try:
        return d[k] == v
    except KeyError:
        return False

Actual Code

Here's the C source code:

static int
dictitems_contains(_PyDictViewObject *dv, PyObject *obj)
{
    int result;
    PyObject *key, *value, *found;
    if (dv->dv_dict == NULL)
        return 0;
    if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
        return 0;
    key = PyTuple_GET_ITEM(obj, 0);
    value = PyTuple_GET_ITEM(obj, 1);
    found = PyDict_GetItemWithError((PyObject *)dv->dv_dict, key);
    if (found == NULL) {
        if (PyErr_Occurred())
            return -1;
        return 0;
    }
    Py_INCREF(found);
    result = PyObject_RichCompareBool(found, value, Py_EQ);
    Py_DECREF(found);
    return result;
}

Timing evidence for O(1) complexity

We get the same constant lookup time regardless of the dictionary size (in these cases: 100, 1,000, and 10,000).

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(100))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(1_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.2 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(10_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.1 nsec per loop

Evidence that lookup calls hash()

We can monitor hash calls by patching _hash_():

class Int(int):
    def __hash__(self):
        print('Hash called')
        return hash(int(self))

Applying the monitoring tool show that hashing occurs when the dictionary is created and again when doing membership testing on the items view:

>>> d = {Int(1): 'one'}
Hash called
>>> (Int(1), 'one') in d.items()
Hash called
True
like image 96
Raymond Hettinger Avatar answered Oct 07 '22 07:10

Raymond Hettinger


Lookup in an instance of dict_items is an O(1) operation (though one with an arbitrarily large constant, related to the complexity of comparing values.)


dictitems_contains doesn't simply try to hash the tuple and look it up in a set-like collection of key/value pairs.

(Note: all of the following links are just to different lines of dictitems_contain, if you don't want to click on them individually.)

To evaluate

(-1, [1]) in d2.items()

it first extracts the key from the tuple, then tries to find that key in the underlying dict. If that lookup fails, it immediately returns false. Only if the key is found does it then compare the value from the tuple to the value mapped to the key in the dict.

At no point does dictitems_contains need to hash the second element of the tuple.

It's not clear in what ways an instance of dict_items is not set-like when the values are non-hashable, as mentioned in the documentation.


A simplified, pure-Python implementation of dict_items.__contains__ might look something like

class DictItems:
    def __init__(self, d):
        self.d = d

    def __contains__(self, t):
        key = t[0]
        value = t[1]
        try:
            dict_value = self.d[key]  # O(1) lookup
        except KeyError:
            return False
    
        return value == dict_value  # Arbitrarily expensive comparison

    ...

where d.items() returns DictItems(d).

like image 22
chepner Avatar answered Oct 07 '22 06:10

chepner