What is the time complexity of checking membership in dict.items()?

Tags:

What is the time complexity of checking membership in dict.items()?

According to the documentation:

Keys views are set-like since their entries are unique and hashable. If all values are hashable, so that (key, value) pairs are unique and hashable, then the items view is also set-like. (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).

So I did some testing with the following code:

from timeit import timeit

def membership(val, container):
    val in container

r = range(100000)
s = set(r)
d = dict.fromkeys(r, 1)
d2 = {k: [1] for k in r}
items_list = list(d2.items())

print('set'.ljust(12), end='')
print(timeit(lambda: membership(-1, s), number=1000))
print('dict'.ljust(12), end='')
print(timeit(lambda: membership(-1, d), number=1000))
print('d_keys'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.keys()), number=1000))
print('d_values'.ljust(12), end='')
print(timeit(lambda: membership(-1, d.values()), number=1000))
print('\n*With hashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d.items()), number=1000))
print('*With unhashable dict.values')
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, 1), d2.items()), number=1000))
print('d_items'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), d2.items()), number=1000))
print('\nitems_list'.ljust(12), end='')
print(timeit(lambda: membership((-1, [1]), items_list), number=1000))

With the output:

set         0.00034419999999998896
dict        0.0003307000000000171
d_keys      0.0004200000000000037
d_values    2.4773092

*With hashable dict.values
d_items     0.0004413000000003109
*With unhashable dict.values
d_items     0.00042879999999989593
d_items     0.0005549000000000248

items_list  3.5529328

As you can see, when the dict.values are all hashable (int),
the execution time for the membership is similar to that of a set or d_keys,
because items view is set-like.
The last two examples are on the dict.values with unhashable objects (list).
So I assumed the execution time would be similar to that of a list.
However, they are still similar to that of a set.

Does this mean that even though dict.values are unhashable objects,
the implementation of items view is still very efficient,
resulting O(1) time complexity for checking the membership?

Am I missing something here?

EDITED per @chepner's comment: dict.fromkeys(r, [1]) -> {k: [1] for k in r}
EDITED per @MarkRansom's comment: another test case list(d2.items())

451

asked Jun 25 '20 15:06

ywbaek

Video Answer

2 Answers

Short-answer

The time complexity of membership testing in item views is O(1).

Psuedo-code for lookup

This is how the membership testing works:

def dictitems_contains(dictview, key_value_pair):
    d = dictview.mapping
    k, v = key_value_pair
    try:
        return d[k] == v
    except KeyError:
        return False

Actual Code

Here's the C source code:

static int
dictitems_contains(_PyDictViewObject *dv, PyObject *obj)
{
    int result;
    PyObject *key, *value, *found;
    if (dv->dv_dict == NULL)
        return 0;
    if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
        return 0;
    key = PyTuple_GET_ITEM(obj, 0);
    value = PyTuple_GET_ITEM(obj, 1);
    found = PyDict_GetItemWithError((PyObject *)dv->dv_dict, key);
    if (found == NULL) {
        if (PyErr_Occurred())
            return -1;
        return 0;
    }
    Py_INCREF(found);
    result = PyObject_RichCompareBool(found, value, Py_EQ);
    Py_DECREF(found);
    return result;
}

Timing evidence for O(1) complexity

We get the same constant lookup time regardless of the dictionary size (in these cases: 100, 1,000, and 10,000).

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(100))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(1_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.2 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(10_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.1 nsec per loop

Evidence that lookup calls hash()

We can monitor hash calls by patching _hash_():

class Int(int):
    def __hash__(self):
        print('Hash called')
        return hash(int(self))

Applying the monitoring tool show that hashing occurs when the dictionary is created and again when doing membership testing on the items view:

>>> d = {Int(1): 'one'}
Hash called
>>> (Int(1), 'one') in d.items()
Hash called
True

answered Oct 07 '22 07:10

Raymond Hettinger

Lookup in an instance of dict_items is an O(1) operation (though one with an arbitrarily large constant, related to the complexity of comparing values.)

dictitems_contains doesn't simply try to hash the tuple and look it up in a set-like collection of key/value pairs.

(Note: all of the following links are just to different lines of dictitems_contain, if you don't want to click on them individually.)

To evaluate

(-1, [1]) in d2.items()

it first extracts the key from the tuple, then tries to find that key in the underlying dict. If that lookup fails, it immediately returns false. Only if the key is found does it then compare the value from the tuple to the value mapped to the key in the dict.

At no point does dictitems_contains need to hash the second element of the tuple.

It's not clear in what ways an instance of dict_items is not set-like when the values are non-hashable, as mentioned in the documentation.

A simplified, pure-Python implementation of dict_items.__contains__ might look something like

class DictItems:
    def __init__(self, d):
        self.d = d

    def __contains__(self, t):
        key = t[0]
        value = t[1]
        try:
            dict_value = self.d[key]  # O(1) lookup
        except KeyError:
            return False
    
        return value == dict_value  # Arbitrarily expensive comparison

    ...

where d.items() returns DictItems(d).

answered Oct 07 '22 06:10

chepner

Related questions
                            
                                React Testing Library: When to use userEvent.click and when to use fireEvent
                            
                                Advantages of using MSBuild or NAnt versus running DevEnv.exe from command-line
                            
                                Seeking clarifications about structuring code to reduce cyclomatic complexity
                            
                                How do I enumerate all the defined variables in a groovy script
                            
                                Returning a 301 Redirect from a Controller Action
                            
                                How can I control the order in which event handlers are fired?
                            
                                Which C++ Library for CGI Programming?
                            
                                Is ASP.NET multithreaded (how does it execute requests)
                            
                                How do I focus a foreign window?
                            
                                Compile to java bytecode (without using Java)
                            
                                Bookmarklet wait until Javascript is loaded
                            
                                SOAP using C# [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the time complexity of checking membership in dict.items()?

Tags:

python

dictionary

time-complexity

python-3.x

ywbaek

People also ask

Video Answer

2 Answers

Short-answer

Psuedo-code for lookup

Actual Code

Timing evidence for O(1) complexity

Evidence that lookup calls hash()

Raymond Hettinger

chepner

Recent Activity

Donate For Us