Optimizing Python Dictionary Lookup Speeds by Shortening Key Size?

Tags:

I'm not clear on what goes on behind the scenes of a dictionary lookup. Does key size factor into the speed of lookup for that key?

Current dictionary keys are between 10-20 long, alphanumeric.

I need to do hundreds of lookups a minute.

If I replace those with smaller key IDs of between 1 & 4 digits will I get faster lookup times? This would mean I would need to add another value in each item the dictionary is holding. Overall the dictionary will be larger.

Also I'll need to change the program to lookup the ID then get the URL associated with the ID.

Am I likely just adding complexity to the program with little benefit?

397

asked Oct 21 '14 21:10

Excelsior

2 Answers

Dictionaries are hash tables, so looking up a key consists of:

Hash the key.
Reduce the hash to the table size.
Index the table with the result.
Compare the looked-up key with the input key.

Normally, this is amortized constant time, and you don't care about anything more than that. There are two potential issues, but they don't come up often.

Hashing the key takes linear time in the length of the key. For, e.g., huge strings, this could be a problem. However, if you look at the source code for most of the important types, including [str/unicode](https://hg.python.org/cpython/file/default/Objects/unicodeobject.c, you'll see that they cache the hash the first time. So, unless you're inputting (or randomly creating, or whatever) a bunch of strings to look up once and then throw away, this is unlikely to be an issue in most real-life programs.

On top of that, 20 characters is really pretty short; you can probably do millions of such hashes per second, not hundreds.

From a quick test on my computer, hashing 20 random letters takes 973ns, hashing a 4-digit number takes 94ns, and hashing a value I've already hashed takes 77ns. Yes, that's nanoseconds.

Meanwhile, "Index the table with the result" is a bit of a cheat. What happens if two different keys hash to the same index? Then "compare the looked-up key" will fail, and… what happens next? CPython's implementation uses probing for this. The exact algorithm is explained pretty nicely in the source. But you'll notice that given really pathological data, you could end up doing a linear search for every single element. This is never going to come up—unless someone can attack your program by explicitly crafting pathological data, in which case it will definitely come up.

Switching from 20-character strings to 4-digit numbers wouldn't help here either. If I'm crafting keys to DoS your system via dictionary collisions, I don't care what your actual keys look like, just what they hash to.

More generally, premature optimization is the root of all evil. This is sometimes misquoted to overstate the point; Knuth was arguing that the most important thing to do is find the 3% of the cases where optimization is important, not that optimization is always a waste of time. But either way, the point is: if you don't know in advance where your program is too slow (and if you think you know in advance, you're usually wrong…), profile it, and then find the part where you get the most bang for your buck. Optimizing one arbitrary piece of your code is likely to have no measurable effect at all.

182

answered Sep 29 '22 09:09

abarnert

Python dictionaries are implemented as hash-maps in the background. The key length might have some impact on the performance if, for example, the hash-functions complexity depends on the key-length. But in general the performance impacts will be definitely negligable.

So I'd say there is little to no benefit for the added complexity.

answered Sep 29 '22 07:09

torpedro

Related questions
                            
                                What is the point of a naive datetime
                            
                                Python: Sklearn.linear_model.LinearRegression working weird
                            
                                Groupby given percentiles of the values of the chosen DataFrame column
                            
                                Intermittent "OSError: [Errno 7] Argument list too long" with short command (~125 chars)
                            
                                Change the title of factor plot in seaborn
                            
                                Define variables(Macro) in Python
                            
                                How to get data from inspect element of a webpage using Python
                            
                                Python waiting for a queue and an event
                            
                                Docx library doesn't recognizes Document method
                            
                                How to 'pip install uwsgi' with alternative build configuration?
                            
                                Python multiple inheritance constructor not called when using super()
                            
                                Static Root and Static Url confusion in Django
                            
                                Node.js's python child script outputting on finish, not real time
                            
                                DatabaseSessionIsOver with Pony ORM due to lazy loading?
                            
                                NLTK tree data structure, finding a node, it's parent or children
                            
                                Django run all tests at once
                            
                                pdist for theano tensor
                            
                                Execute Python code embedded in YAML file
                            
                                Python super on Meta Class from another class
                            
                                Can I step into Python library code?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimizing Python Dictionary Lookup Speeds by Shortening Key Size?

Tags:

performance

python

dictionary

optimization

Excelsior

People also ask

2 Answers

abarnert

torpedro

Recent Activity

Donate For Us