Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is key in dict() faster than dict.get(key) in Python3?

I want to check whether a key exists in a dictionary. The most appropriate way, as per my knowledge is: if d_.get(s):. But, while attempting a question on Leetcode, there was a TLE error when I was using this method. However, when I tried if s in d_, TLE was gone. I want to know why in is faster than get().

I tried going through some questions, found this one where there is an explanation for d_.get() v/s d_[s]. None of the questions addressed d_.get() v/s s in d_.

Just in case, some context:

The code that failed with if self.memo.get(s)::

from typing import List


class Solution:
    def __init__(self):
        self.word_dict = {}
        self.memo = {}

    def word_break(self, s):
        if not s:
            return True
        if self.memo.get(s):
            return self.memo[s]
        res = False
        for word in self.word_dict.keys():
            if len(word) <= len(s) and s[:len(word)] == word:
                res = res or self.word_break(s[len(word):])
                self.memo[s] = res
        return res

    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        for word in wordDict:
            self.word_dict[word] = 1
        return(self.word_break(s))

The code than got accepted with if s in self.memo:

from typing import List


class Solution:
    def __init__(self):
        self.word_dict = {}
        self.memo = {}

    def word_break(self, s):
        if not s:
            return True
        if s in self.memo:
            return self.memo[s]
        res = False
        for word in self.word_dict.keys():
            if len(word) <= len(s) and s[:len(word)] == word:
                res = res or self.word_break(s[len(word):])
                self.memo[s] = res
        return res

    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        for word in wordDict:
            self.word_dict[word] = 1
        return(self.word_break(s))

I always presumed that in would be slower than fetching attributes(here, get()).

like image 671
Aviral Srivastava Avatar asked Oct 31 '19 19:10

Aviral Srivastava


People also ask

What does dict keys () do in Python?

Python Dictionary keys() The keys() method extracts the keys of the dictionary and returns the list of keys as a view object.

Should I use dict () or {}?

With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.

Is dict get faster?

get() with a default is actually slower than using if-else expressions.

Are dict keys slow?

Using dictionary. keys() is slower because it does more work: It adds an attribute lookup; dictionary.

How to iterate through the keys of a dictionary in Python?

The following is the syntax: It returns a dict_keys object containing the keys of the dictionary. This object is iterable, that is, you can use it to iterate through the keys in the dictionary. You can use the list () function to convert this object into a list. Let’s look at some examples.

Can I index Dict_keys in Python 3?

Note: The second approach would not work because dict_keys in Python 3 does not support indexing. Writing code in comment? Please use ide.geeksforgeeks.org , generate link and share the link here.

What is dictionary keys () method in Python?

Python Dictionary keys () method Difficulty Level : Basic Last Updated : 31 May, 2021 Dictionary in Python is a collection of data values which only maintains the order of insertion, used to store data values like a map, which, unlike other Data Types that hold only a single value as an element, Dictionary holds key: value pair.

How to convert a dictionary to a list in Python?

Now, you can also convert this object to a list using the Python built-in list () function. We now have the keys in the dictionary employees as a list. Alternatively, you can iterate through the dictionary items and append the key in each iteration to a result list. We get the keys in the dictionary as a list.


3 Answers

Using the dis.dis method from the linked question:

>>> import dis
>>> dis.dis(compile('d.get(key)', '', 'eval'))
  1           0 LOAD_NAME                0 (d)
              2 LOAD_METHOD              1 (get)
              4 LOAD_NAME                2 (key)
              6 CALL_METHOD              1
              8 RETURN_VALUE
>>> dis.dis(compile('key in d', '', 'eval'))
  1           0 LOAD_NAME                0 (key)
              2 LOAD_NAME                1 (d)
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE

we can clearly see that d.get(key) has to run one more step: the LOAD_METHOD step. Additionally, d.get must deal with more information: it has to:

  1. check for the presence
  2. if it was found, return the value
  3. otherwise, return the specified default value (or None if no default was specified).

Also, from looking at the C code for in and the C code for .get, we can see that they are very similar.

int                                                           static PyObject * 
PyDict_Contains(PyObject *op, PyObject *key)                  dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{                                                             {
    Py_hash_t hash;                                               PyObject *val = NULL;
    Py_ssize_t ix;                                                Py_hash_t hash;
    PyDictObject *mp = (PyDictObject *)op;                        Py_ssize_t ix;                       
    PyObject *value;                                           

    if (!PyUnicode_CheckExact(key) ||                             if (!PyUnicode_CheckExact(key) ||                  
        (hash = ((PyASCIIObject *) key)->hash) == -1) {               (hash = ((PyASCIIObject *) key)->hash) == -1) {                             
        hash = PyObject_Hash(key);                                    hash = PyObject_Hash(key);        
        if (hash == -1)                                               if (hash == -1)
            return -1;                                                    return NULL;
    }                                                             }
    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value);         ix = (self->ma_keys->dk_lookup) (self, key, hash, &val);                                         
    if (ix == DKIX_ERROR)                                         if (ix == DKIX_ERROR) 
        return -1;                                                    return NULL;
    return (ix != DKIX_EMPTY && value != NULL);                   if (ix == DKIX_EMPTY || val == NULL) {                        
}                                                                     val = default_value;
                                                                  }
                                                                  Py_INCREF(val);
                                                                  return val;
                                                              }

In fact, they are almost the same, but .get has more overhead and must return a value.

However, it seems that d in key will use a faster method if the hash is known, while d.get recalculates the hash every time. Additionally, CALL_METHOD and LOAD_METHOD have much higher overhead than COMPARE_OP, which performs one of the built-in boolean operations. Note that COMPARE_OP will simply jump to here.

like image 177
rassar Avatar answered Nov 11 '22 17:11

rassar


The time overhead is in calling a method explicitly, as opposed to letting language constructs take care of it. We can demonstrate this with timeit:

>>> timeit.timeit('"__name__" in x', 'x = globals()')
0.037103720999766665
>>> timeit.timeit('x.__contains__("__name__")', 'x = globals()')
0.07471312899997429
>>> timeit.timeit('x["__name__"]', 'x = globals()')
0.03828814600001351
>>> timeit.timeit('x.__getitem__("__name__")', 'x = globals()')
0.07529343100031838
>>> timeit.timeit('x.get("__name__")', 'x = globals()')
0.08261531900006958

I initially started trying to figure out the difference by looking at the source code for __contains__() and .get(), respectively, only to find that they're nearly identical except for .get() incrementing the object's reference count (which should be more or less negligible). Certainly there wasn't enough difference to explain the time difference you'd be seeing.

But, doing tests, we can see that actually using language constructs (in and []) as opposed to the explicit method calls that they would turn into (__contains__() and __getitem__(), respectively), is a full 50% faster.

A full investigation would take a while and more effort than I care to spend, but I hypothesize this is due to some built-in speedups and skipped steps that the interpreter applies - using a language construct instead of explicitly calling a method narrows down the level of complexity that can be expected, and the interpreter could jump straight into the C code without the overhead of calling the method first.


As @rassar's answer demonstrates, this is, in fact, basically what happens.

like image 34
Green Cloak Guy Avatar answered Nov 11 '22 17:11

Green Cloak Guy


The two pieces of code do not do the same thing. Notice how self.memo is set:

self.memo[s] = res

If res is False, the if statement for the get will fail while the if for in will succeed.

like image 26
Mark Ransom Avatar answered Nov 11 '22 19:11

Mark Ransom