Python dictionary keys. "In" complexity

Tags:

Quick question to mainly satisfy my curiosity on the topic.

I am writing some large python programs with an SQlite database backend and will be dealing with a large number of records in the future, so I need to optimize as much as I can.

For a few functions, I am searching through keys in a dictionary. I have been using the "in" keyword for prototyping and was planning on going back and optimizing those searches later as I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element). But, as a python dict is basically just a hash map, is the python interpreter smart enough to interpret:

if(key in dict.keys()):     ...code...

to:

if(dict[key] != None):     ...code...

It is basically the same operation but the top would be O(n) and the bottom would be O(1).

It's easy for me to use the bottom version in my code, but then I was just curious and thought I would ask.

590

asked Jul 09 '13 03:07

tknickman

1 Answers

First, key in d.keys() is guaranteed to give you the same value as key in d for any dict d.

And the in operation on a dict, or the dict_keys object you get back from calling keys() on it (in 3.x), is not O(N), it's O(1).

There's no real "optimization" going on; it's just that using the hash is the obvious way to implement __contains__ on a hash table, just as it's the obvious way to implement __getitem__.

You may ask where this is guaranteed.

Well, it's not. Mapping Types defines dict as, basically, a hash table implementation of collections.abc.Mapping. There's nothing stopping someone from creating a hash table implementation of a Mapping, but still providing O(N) searches. But it would be extra work to make such a bad implementation, so why would they?

If you really need to prove it to yourself, you can test every implementation you care about (with a profiler, or by using some type with a custom __hash__ and __eq__ that logs calls, or…), or read the source.

In 2.x, you do not want to call keys, because that generates a list of the keys, instead of a KeysView. You could use iterkeys, but that may generate an iterator or something else that's not O(1). So, just use the dict itself as a sequence.

Even in 3.x, you don't want to call keys, because there's no need to. Iterating a dict, checking its __contains__, and in general treating it like a sequence is always equivalent to doing the same thing to its keys, so why bother? (And of course building the trivial KeyView, and accessing through it, are going to add a few nanoseconds to your running time and a few keystrokes to your program.)

(It's not quite clear that using sequence operations is equivalent for d.keys()/d.iterkeys() and d in 2.x. Other than performance issues, they are equivalent in every CPython, Jython, IronPython, and PyPy implementation, but it doesn't seem to be stated anywhere the way it is in 3.x. And it doesn't matter; just use key in d.)

While we're at it, note that this:

if(dict[key] != None):

… is not going to work. If the key is not in the dict, this will raise KeyError, not return None.

Also, you should never check None with == or !=; always use is.

You can do this with a try—or, more simply, do if dict.get(key, None) is not None. But again, there's no reason to do so. Also, that won't handle cases where None is a perfectly valid item. If that's the case, you need to do something like sentinel = object(); if dict.get(key, sentinel) is not sentinel:.

So, the right thing to write is:

if key in d:

More generally, this is not true:

I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element

The in operator, like most other operators, is just a call to a __contains__ method (or the equivalent for a C/Java/.NET/RPython builtin). list implements it by iterating the list and comparing each element; dict implements it by hashing the value and looking up the hash; blist.blist implements it by walking a B+Tree; etc. So, it could be O(n), O(1), O(log n), or something completely different.

178

answered Sep 19 '22 17:09

abarnert

Related questions
                            
                                TypeError: 'dict' object is not callable
                            
                                Weak References in python
                            
                                print variable and a string in python
                            
                                Save and load weights in keras
                            
                                OrderedDict comprehensions
                            
                                Dimension of shape in conv1D
                            
                                Perform a string operation for every element in a Python list
                            
                                How to set default value to all keys of a dict object in python?
                            
                                Concatenate strings in python in multiline
                            
                                R expand.grid() function in Python
                            
                                Randint doesn't always follow uniform distribution
                            
                                select pandas rows by excluding index number
                            
                                How do I send a custom header with urllib2 in a HTTP Request?
                            
                                "Asyncio Event Loop is Closed" when getting loop
                            
                                Django - Reverse for '' not found. '' is not a valid view function or pattern name
                            
                                Reading from a frequently updated file
                            
                                General decorator to wrap try except in python?
                            
                                Python: How to run unittest.main() for all source files in a subdirectory?
                            
                                Why is adding attributes to an already instantiated object allowed?
                            
                                Do CSRF attacks apply to API's?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python dictionary keys. "In" complexity

Tags:

python

dictionary

complexity-theory

big-o

hashmap

tknickman

People also ask

1 Answers

abarnert

Recent Activity

Donate For Us