Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it always faster to use string as key in a dict?

Tags:

On this page, I see something interesting:

Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn't affect the algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical program finishes.

So what does it exactly mean?

Does it mean using string as the key is always faster?

If yes, why?

Update:

Thanks for the suggestions about optimization! But I'm actually more interested in the plain truth, than whether or when we should do optimization.

Update 2:

Thanks for the great answers, I'll cite the content from the link provided by @DaveWebb here:

" ...

ma_lookup is initially set to the lookdict_string function (renamed to lookdict_unicode in 3.0), which assumes that both the keys in the dictionary and the key being searched for are standard PyStringObject's. It is then able to make a couple of optimiziations, such as mitigating various error checks, since string-to-string comparison never raise exceptions. There is also no need for rich object comparisons either, which means we avoid calling PyObject_RichCompareBool, and always use _PyString_Eq directly.

... "

Also, for the experiment numbers, I think the size of the difference will be even bigger if there is no int-to-string conversion

like image 556
xvatar Avatar asked Jun 22 '12 18:06

xvatar


People also ask

Can strings be used as dictionary keys?

Second, a dictionary key must be of a type that is immutable. For example, you can use an integer, float, string, or Boolean as a dictionary key.

Are dict keys slow?

Using dictionary. keys() is slower because it does more work: It adds an attribute lookup; dictionary.

Which is faster dict or list?

A dictionary is 6.6 times faster than a list when we lookup in 100 items.

Should I use dict () or {}?

With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case.


2 Answers

The C code that underlies the Python dict is optimisted for String keys. You can read about this here (and in the book the blog refers to).

If the Python runtime knows your dict only contains string keys it can do things such as not cater for errors that won't happen with a string to string comparison and ignore the rich comparison operators. This will make the common case of the string key only dict a little faster. (Update: timing shows it to be more than a little.)

However, it is unlikely that this would make a significant change to the run time of most Python programs. Only worry about this optimisation if you have measured and found dict lookups to be a bottleneck in your code. As the famous quote says, "Premature optimization is the root of all evil."

The only way to see how much faster things really are, is to time them:

>>> timeit.timeit('a["500"]','a ={}\nfor i in range(1000): a[str(i)] = i') 0.06659698486328125 >>> timeit.timeit('a[500]','a ={}\nfor i in range(1000): a[i] = i') 0.09005999565124512 

So using string keys is about 30% faster even compared to int keys, and I have to admit I was surprised at the size of the difference.

like image 59
Dave Webb Avatar answered Sep 28 '22 23:09

Dave Webb


As this only affects the constant time, it's likely not to matter at all. The only time you really need to optimise is when you are working with very large data sets - which this does nothing to affect.

What this does mean is that in the cases where you have small dictionaries with strings as keys, Python will be quick - this is a common usage, so it's been optimised for.

As Ignacio Vazquez-Abrams points out, it's likely that converting your key to a string will cost (far) more than the slight boost you might gain from it being a string for the dict.

In short use what is relevant to your situation - optimisation should only be done where there is a need for it, not before.

Some tests:

python -m timeit -s "a={key: 1 for key in range(1000)}" "a[500]" 10000000 loops, best of 3: 0.0773 usec per loop  python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[\"500\"]" 10000000 loops, best of 3: 0.0452 usec per loop  python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[str(500)]" 1000000 loops, best of 3: 0.244 usec per loop 

As you can see, while the string-based dict is faster, converting the key is very expensive by comparison, totally mitigating the gain (and then some).

So yes, if the data you are using is only being used as keys to the dictionary, and what format your store them in doesn't matter, then strings are preferable, in a small dictionary. In practice, that is a very rare case (and you'd probably be using strings already).

like image 31
Gareth Latty Avatar answered Sep 28 '22 23:09

Gareth Latty