I want to check whether a key exists in a dictionary. The most appropriate way, as per my knowledge is:
if d_.get(s):
. But, while attempting a question on Leetcode, there was a TLE error when I was using this method. However, when I tried if s in d_
, TLE was gone. I want to know why in
is faster than get()
.
I tried going through some questions, found this one where there is an explanation for d_.get()
v/s d_[s]
. None of the questions addressed d_.get()
v/s s in d_
.
Just in case, some context:
The code that failed with if self.memo.get(s):
:
from typing import List
class Solution:
def __init__(self):
self.word_dict = {}
self.memo = {}
def word_break(self, s):
if not s:
return True
if self.memo.get(s):
return self.memo[s]
res = False
for word in self.word_dict.keys():
if len(word) <= len(s) and s[:len(word)] == word:
res = res or self.word_break(s[len(word):])
self.memo[s] = res
return res
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
for word in wordDict:
self.word_dict[word] = 1
return(self.word_break(s))
The code than got accepted with if s in self.memo
:
from typing import List
class Solution:
def __init__(self):
self.word_dict = {}
self.memo = {}
def word_break(self, s):
if not s:
return True
if s in self.memo:
return self.memo[s]
res = False
for word in self.word_dict.keys():
if len(word) <= len(s) and s[:len(word)] == word:
res = res or self.word_break(s[len(word):])
self.memo[s] = res
return res
def wordBreak(self, s: str, wordDict: List[str]) -> bool:
for word in wordDict:
self.word_dict[word] = 1
return(self.word_break(s))
I always presumed that in
would be slower than fetching attributes(here, get()
).
Python Dictionary keys() The keys() method extracts the keys of the dictionary and returns the list of keys as a view object.
With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.
get() with a default is actually slower than using if-else expressions.
Using dictionary. keys() is slower because it does more work: It adds an attribute lookup; dictionary.
The following is the syntax: It returns a dict_keys object containing the keys of the dictionary. This object is iterable, that is, you can use it to iterate through the keys in the dictionary. You can use the list () function to convert this object into a list. Let’s look at some examples.
Note: The second approach would not work because dict_keys in Python 3 does not support indexing. Writing code in comment? Please use ide.geeksforgeeks.org , generate link and share the link here.
Python Dictionary keys () method Difficulty Level : Basic Last Updated : 31 May, 2021 Dictionary in Python is a collection of data values which only maintains the order of insertion, used to store data values like a map, which, unlike other Data Types that hold only a single value as an element, Dictionary holds key: value pair.
Now, you can also convert this object to a list using the Python built-in list () function. We now have the keys in the dictionary employees as a list. Alternatively, you can iterate through the dictionary items and append the key in each iteration to a result list. We get the keys in the dictionary as a list.
Using the dis.dis
method from the linked question:
>>> import dis
>>> dis.dis(compile('d.get(key)', '', 'eval'))
1 0 LOAD_NAME 0 (d)
2 LOAD_METHOD 1 (get)
4 LOAD_NAME 2 (key)
6 CALL_METHOD 1
8 RETURN_VALUE
>>> dis.dis(compile('key in d', '', 'eval'))
1 0 LOAD_NAME 0 (key)
2 LOAD_NAME 1 (d)
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
we can clearly see that d.get(key)
has to run one more step: the LOAD_METHOD
step. Additionally, d.get
must deal with more information: it has to:
None
if no default was specified).Also, from looking at the C code for in
and the C code for .get
, we can see that they are very similar.
int static PyObject *
PyDict_Contains(PyObject *op, PyObject *key) dict_get_impl(PyDictObject *self, PyObject *key, PyObject *default_value)
{ {
Py_hash_t hash; PyObject *val = NULL;
Py_ssize_t ix; Py_hash_t hash;
PyDictObject *mp = (PyDictObject *)op; Py_ssize_t ix;
PyObject *value;
if (!PyUnicode_CheckExact(key) || if (!PyUnicode_CheckExact(key) ||
(hash = ((PyASCIIObject *) key)->hash) == -1) { (hash = ((PyASCIIObject *) key)->hash) == -1) {
hash = PyObject_Hash(key); hash = PyObject_Hash(key);
if (hash == -1) if (hash == -1)
return -1; return NULL;
} }
ix = (mp->ma_keys->dk_lookup)(mp, key, hash, &value); ix = (self->ma_keys->dk_lookup) (self, key, hash, &val);
if (ix == DKIX_ERROR) if (ix == DKIX_ERROR)
return -1; return NULL;
return (ix != DKIX_EMPTY && value != NULL); if (ix == DKIX_EMPTY || val == NULL) {
} val = default_value;
}
Py_INCREF(val);
return val;
}
In fact, they are almost the same, but .get
has more overhead and must return a value.
However, it seems that d in key
will use a faster method if the hash is known, while d.get
recalculates the hash every time. Additionally, CALL_METHOD
and LOAD_METHOD
have much higher overhead than COMPARE_OP
, which performs one of the built-in boolean operations. Note that COMPARE_OP will simply jump to here.
The time overhead is in calling a method explicitly, as opposed to letting language constructs take care of it. We can demonstrate this with timeit
:
>>> timeit.timeit('"__name__" in x', 'x = globals()')
0.037103720999766665
>>> timeit.timeit('x.__contains__("__name__")', 'x = globals()')
0.07471312899997429
>>> timeit.timeit('x["__name__"]', 'x = globals()')
0.03828814600001351
>>> timeit.timeit('x.__getitem__("__name__")', 'x = globals()')
0.07529343100031838
>>> timeit.timeit('x.get("__name__")', 'x = globals()')
0.08261531900006958
I initially started trying to figure out the difference by looking at the source code for __contains__()
and .get()
, respectively, only to find that they're nearly identical except for .get()
incrementing the object's reference count (which should be more or less negligible). Certainly there wasn't enough difference to explain the time difference you'd be seeing.
But, doing tests, we can see that actually using language constructs (in
and []
) as opposed to the explicit method calls that they would turn into (__contains__()
and __getitem__()
, respectively), is a full 50% faster.
A full investigation would take a while and more effort than I care to spend, but I hypothesize this is due to some built-in speedups and skipped steps that the interpreter applies - using a language construct instead of explicitly calling a method narrows down the level of complexity that can be expected, and the interpreter could jump straight into the C code without the overhead of calling the method first.
As @rassar's answer demonstrates, this is, in fact, basically what happens.
The two pieces of code do not do the same thing. Notice how self.memo
is set:
self.memo[s] = res
If res
is False
, the if
statement for the get
will fail while the if
for in
will succeed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With