I want to check whether a key exists in a dictionary. The most appropriate way, as per my knowledge is: <code>if d_.get(s):</code>. But, while attempting a question on Leetcode, there was a TLE error when I was using this method. However, when I tried <code>if s in d_</code>, TLE was gone. I want to know why <code>in</code> is faster than <code>get()</code>. I tried going through some questions, found this one where there is an explanation for <code>d_.get()</code> v/s <code>d_[s]</code>. None of the questions addressed <code>d_.get()</code> v/s <code>s in d_</code>. Just in case, some context: The code that failed with <code>if self.memo.get(s):</code>: <pre class="prettyprint"><code>from typing import List class Solution: def __init__(self): self.word_dict = {} self.memo = {} def word_break(self, s): if not s: return True if self.memo.get(s): return self.memo[s] res = False for word in self.word_dict.keys(): if len(word) <= len(s) and s[:len(word)] == word: res = res or self.word_break(s[len(word):]) self.memo[s] = res return res def wordBreak(self, s: str, wordDict: List[str]) -> bool: for word in wordDict: self.word_dict[word] = 1 return(self.word_break(s)) </code></pre> The code than got accepted with <code>if s in self.memo</code>: <pre class="prettyprint"><code>from typing import List class Solution: def __init__(self): self.word_dict = {} self.memo = {} def word_break(self, s): if not s: return True if s in self.memo: return self.memo[s] res = False for word in self.word_dict.keys(): if len(word) <= len(s) and s[:len(word)] == word: res = res or self.word_break(s[len(word):]) self.memo[s] = res return res def wordBreak(self, s: str, wordDict: List[str]) -> bool: for word in wordDict: self.word_dict[word] = 1 return(self.word_break(s)) </code></pre> I always presumed that <code>in</code> would be slower than fetching attributes(here, <code>get()</code>).

The two pieces of code do not do the same thing. Notice how <code>self.memo</code> is set: <pre class="prettyprint"><code>self.memo[s] = res </code></pre> If <code>res</code> is <code>False</code>, the <code>if</code> statement for the <code>get</code> will fail while the <code>if</code> for <code>in</code> will succeed.

Why is key in dict() faster than dict.get(key) in Python3?

Tags:

python

dictionary

python-3.x

I want to check whether a key exists in a dictionary. The most appropriate way, as per my knowledge is: if d_.get(s):. But, while attempting a question on Leetcode, there was a TLE error when I was using this method. However, when I tried if s in d_, TLE was gone. I want to know why in is faster than get().

I tried going through some questions, found this one where there is an explanation for d_.get() v/s d_[s]. None of the questions addressed d_.get() v/s s in d_.

Just in case, some context:

The code that failed with if self.memo.get(s)::

from typing import List


class Solution:
    def __init__(self):
        self.word_dict = {}
        self.memo = {}

    def word_break(self, s):
        if not s:
            return True
        if self.memo.get(s):
            return self.memo[s]
        res = False
        for word in self.word_dict.keys():
            if len(word) <= len(s) and s[:len(word)] == word:
                res = res or self.word_break(s[len(word):])
                self.memo[s] = res
        return res

    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        for word in wordDict:
            self.word_dict[word] = 1
        return(self.word_break(s))

The code than got accepted with if s in self.memo:

from typing import List


class Solution:
    def __init__(self):
        self.word_dict = {}
        self.memo = {}

    def word_break(self, s):
        if not s:
            return True
        if s in self.memo:
            return self.memo[s]
        res = False
        for word in self.word_dict.keys():
            if len(word) <= len(s) and s[:len(word)] == word:
                res = res or self.word_break(s[len(word):])
                self.memo[s] = res
        return res

    def wordBreak(self, s: str, wordDict: List[str]) -> bool:
        for word in wordDict:
            self.word_dict[word] = 1
        return(self.word_break(s))

I always presumed that in would be slower than fetching attributes(here, get()).

671

asked Oct 31 '19 19:10

Aviral Srivastava

3 Answers

Using the dis.dis method from the linked question:

>>> import dis
>>> dis.dis(compile('d.get(key)', '', 'eval'))
  1           0 LOAD_NAME                0 (d)
              2 LOAD_METHOD              1 (get)
              4 LOAD_NAME                2 (key)
              6 CALL_METHOD              1
              8 RETURN_VALUE
>>> dis.dis(compile('key in d', '', 'eval'))
  1           0 LOAD_NAME                0 (key)
              2 LOAD_NAME                1 (d)
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE

we can clearly see that d.get(key) has to run one more step: the LOAD_METHOD step. Additionally, d.get must deal with more information: it has to:

check for the presence
if it was found, return the value
otherwise, return the specified default value (or None if no default was specified).

Also, from looking at the C code for in and the C code for .get, we can see that they are very similar.

int                                                           static PyObject * 
PyDict_Contains(PyObject *op, PyObject *key)                  dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{                                                             {
    Py_hash_t hash;                                               PyObject *val = NULL;
    Py_ssize_t ix;                                                Py_hash_t hash;
    PyDictObject *mp = (PyDictObject *)op;                        Py_ssize_t ix;                       
    PyObject *value;                                           

    if (!PyUnicode_CheckExact(key) ||                             if (!PyUnicode_CheckExact(key) ||                  
        (hash = ((PyASCIIObject *) key)->hash) == -1) {               (hash = ((PyASCIIObject *) key)->hash) == -1) {                             
        hash = PyObject_Hash(key);                                    hash = PyObject_Hash(key);        
        if (hash == -1)                                               if (hash == -1)
            return -1;                                                    return NULL;
    }                                                             }
    ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value);         ix = (self->ma_keys->dk_lookup) (self, key, hash, &val);                                         
    if (ix == DKIX_ERROR)                                         if (ix == DKIX_ERROR) 
        return -1;                                                    return NULL;
    return (ix != DKIX_EMPTY && value != NULL);                   if (ix == DKIX_EMPTY || val == NULL) {                        
}                                                                     val = default_value;
                                                                  }
                                                                  Py_INCREF(val);
                                                                  return val;
                                                              }

In fact, they are almost the same, but .get has more overhead and must return a value.

However, it seems that d in key will use a faster method if the hash is known, while d.get recalculates the hash every time. Additionally, CALL_METHOD and LOAD_METHOD have much higher overhead than COMPARE_OP, which performs one of the built-in boolean operations. Note that COMPARE_OP will simply jump to here.

177

answered Nov 11 '22 17:11

rassar

The time overhead is in calling a method explicitly, as opposed to letting language constructs take care of it. We can demonstrate this with timeit:

>>> timeit.timeit('"__name__" in x', 'x = globals()')
0.037103720999766665
>>> timeit.timeit('x.__contains__("__name__")', 'x = globals()')
0.07471312899997429
>>> timeit.timeit('x["__name__"]', 'x = globals()')
0.03828814600001351
>>> timeit.timeit('x.__getitem__("__name__")', 'x = globals()')
0.07529343100031838
>>> timeit.timeit('x.get("__name__")', 'x = globals()')
0.08261531900006958

I initially started trying to figure out the difference by looking at the source code for __contains__() and .get(), respectively, only to find that they're nearly identical except for .get() incrementing the object's reference count (which should be more or less negligible). Certainly there wasn't enough difference to explain the time difference you'd be seeing.

But, doing tests, we can see that actually using language constructs (in and []) as opposed to the explicit method calls that they would turn into (__contains__() and __getitem__(), respectively), is a full 50% faster.

A full investigation would take a while and more effort than I care to spend, but I hypothesize this is due to some built-in speedups and skipped steps that the interpreter applies - using a language construct instead of explicitly calling a method narrows down the level of complexity that can be expected, and the interpreter could jump straight into the C code without the overhead of calling the method first.

As @rassar's answer demonstrates, this is, in fact, basically what happens.

answered Nov 11 '22 17:11

Green Cloak Guy

The two pieces of code do not do the same thing. Notice how self.memo is set:

self.memo[s] = res

If res is False, the if statement for the get will fail while the if for in will succeed.

answered Nov 11 '22 19:11

Mark Ransom

Related questions
                            
                                Why doesn't "is not None" work with dataframe.loc, but "!= None" works fine?
                            
                                VSCode Python version defaults to 2.7 in the integrated terminal no matter what I do [duplicate]
                            
                                SQLAlchemy: group by day over multiple tables
                            
                                How to fix ' KeyError: 'accuracy' ' when running flowers_tf_lite.ipynb?
                            
                                Python: take screenshot from video
                            
                                Change legend position using holoviews / hvplot
                            
                                How to create a wordcloud according to frequencies in a pandas dataframe
                            
                                How do I get retry handling with python zeep? I'm using a requests retry session, but the exception is not handled
                            
                                How to detect if decimal columns should be converted into integer or double?
                            
                                How does tf.audio.decode_wav get its contents?
                            
                                Python pathlib.Path - how do I get just a platform independent file separator as a string?
                            
                                Decompose a combined IntFlag into its individual flags
                            
                                How to access foreign key table's data in Django templates?
                            
                                Plotly - How to set width to specific line?
                            
                                Can someone explain MaxAbsScaler in Scikit-learn?
                            
                                Can PyCharm display variable value as hexadecimal number?
                            
                                loading a multiple .txt files in to python as dataframe
                            
                                mock boto3 response for downloading file from S3
                            
                                How can I tell Pandas read_csv to use multiple whitespaces as separators but not single whitespaces?
                            
                                TensorFlow 2.0 [Condition x == y did not hold element-wise:]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With