Why are dict lookups always better than list lookups?

Tags:

I was using a dictionary as a lookup table but I started to wonder if a list would be better for my application -- the amount of entries in my lookup table wasn't that big. I know lists use C arrays under the hood which made me conclude that lookup in a list with just a few items would be better than in a dictionary (accessing a few elements in an array is faster than computing a hash).

I decided to profile the alternatives but the results surprised me. List lookup was only better with a single element! See the following figure (log-log plot):

list vs dict lookup time

So here comes the question: Why do list lookups perform so poorly? What am I missing?

On a side question, something else that called my attention was a little "discontinuity" in the dict lookup time after approximately 1000 entries. I plotted the dict lookup time alone to show it.

dict lookup time

p.s.1 I know about O(n) vs O(1) amortized time for arrays and hash tables, but it is usually the case that for a small number of elements iterating over an array is better than to use a hash table.

p.s.2 Here is the code I used to compare the dict and list lookup times:

import timeit  lengths = [2 ** i for i in xrange(15)]  list_time = [] dict_time = [] for l in lengths:     list_time.append(timeit.timeit('%i in d' % (l/2), 'd=range(%i)' % l))     dict_time.append(timeit.timeit('%i in d' % (l/2),                                    'd=dict.fromkeys(range(%i))' % l))     print l, list_time[-1], dict_time[-1]

p.s.3 Using Python 2.7.13

974

asked Apr 28 '17 23:04

hugos

1 Answers

I know lists use C arrays under the hood which made me conclude that lookup in a list with just a few items would be better than in a dictionary (accessing a few elements in an array is faster than computing a hash).

Accessing a few array elements is cheap, sure, but computing == is surprisingly heavyweight in Python. See that spike in your second graph? That's the cost of computing == for two ints right there.

Your list lookups need to compute == a lot more than your dict lookups do.

Meanwhile, computing hashes might be a pretty heavyweight operation for a lot of objects, but for all ints involved here, they just hash to themselves. (-1 would hash to -2, and large integers (technically longs) would hash to smaller integers, but that doesn't apply here.)

Dict lookup isn't really that bad in Python, especially when your keys are just a consecutive range of ints. All ints here hash to themselves, and Python uses a custom open addressing scheme instead of chaining, so all your keys end up nearly as contiguous in memory as if you'd used a list (which is to say, the pointers to the keys end up in a contiguous range of PyDictEntrys). The lookup procedure is fast, and in your test cases, it always hits the right key on the first probe.

Okay, back to the spike in graph 2. The spike in the lookup times at 1024 entries in the second graph is because for all smaller sizes, the integers you were looking for were all <= 256, so they all fell within the range of CPython's small integer cache. The reference implementation of Python keeps canonical integer objects for all integers from -5 to 256, inclusive. For these integers, Python was able to use a quick pointer comparison to avoid going through the (surprisingly heavyweight) process of computing ==. For larger integers, the argument to in was no longer the same object as the matching integer in the dict, and Python had to go through the whole == process.

answered Oct 02 '22 18:10

user2357112 supports Monica

Related questions
                            
                                Change a string of integers separated by spaces to a list of int
                            
                                Python dateutil.parser.parse parses month first, not day
                            
                                What is a "good" palette for divergent colors in R? (or: can viridis and magma be combined together?)
                            
                                How do I run another script in Python without waiting for it to finish? [duplicate]
                            
                                get class name for empty queryset in django
                            
                                How to restore a builtin that I overwrote by accident?
                            
                                See when packages were installed / updated using pip
                            
                                NotImplementedError: Layers with arguments in `__init__` must override `get_config`
                            
                                Get timer ticks in Python
                            
                                How can I pickle a dynamically created nested class in python?
                            
                                Point in Polygon with geoJSON in Python
                            
                                Django Test Client Method Override Header
                            
                                Merge a list of dataframes to create one dataframe [duplicate]
                            
                                Python scope: "UnboundLocalError: local variable 'c' referenced before assignment" [duplicate]
                            
                                RSS feed parser library in Python [closed]
                            
                                Python Replace \\ with \
                            
                                Django : Filter query based on custom function
                            
                                Find and Replace Values in XML using Python
                            
                                Why use setattr() and getattr() built-ins?
                            
                                Python collections.Counter: most_common complexity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are dict lookups always better than list lookups?

Tags:

performance

python

big-o

optimization

python-internals

hugos

People also ask

1 Answers

user2357112 supports Monica

Recent Activity

Donate For Us