On this page, I see something interesting: Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn't affect the algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical program finishes. So what does it exactly mean? Does it mean using string as the key is always faster? If yes, why? Update: Thanks for the suggestions about optimization! But I'm actually more interested in the plain truth, than whether or when we should do optimization. Update 2: Thanks for the great answers, I'll cite the content from the link provided by @DaveWebb here: " ... ma_lookup is initially set to the lookdict_string function (renamed to lookdict_unicode in 3.0), which assumes that both the keys in the dictionary and the key being searched for are standard PyStringObject's. It is then able to make a couple of optimiziations, such as mitigating various error checks, since string-to-string comparison never raise exceptions. There is also no need for rich object comparisons either, which means we avoid calling PyObject_RichCompareBool, and always use _PyString_Eq directly. ... " Also, for the experiment numbers, I think the size of the difference will be even bigger if there is no int-to-string conversion

As this only affects the constant time, it's likely not to matter at all. The only time you really need to optimise is when you are working with very large data sets - which this does nothing to affect. What this does mean is that in the cases where you have small dictionaries with strings as keys, Python will be quick - this is a common usage, so it's been optimised for. As Ignacio Vazquez-Abrams points out, it's likely that converting your key to a string will cost (far) more than the slight boost you might gain from it being a string for the dict. In short use what is relevant to your situation - optimisation should only be done where there is a need for it, not before. Some tests: <pre class="prettyprint"><code>python -m timeit -s "a={key: 1 for key in range(1000)}" "a[500]" 10000000 loops, best of 3: 0.0773 usec per loop python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[\"500\"]" 10000000 loops, best of 3: 0.0452 usec per loop python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[str(500)]" 1000000 loops, best of 3: 0.244 usec per loop </code></pre> As you can see, while the string-based dict is faster, converting the key is very expensive by comparison, totally mitigating the gain (and then some). So yes, if the data you are using is only being used as keys to the dictionary, and what format your store them in doesn't matter, then strings are preferable, in a small dictionary. In practice, that is a very rare case (and you'd probably be using strings already).

Is it always faster to use string as key in a dict?

Tags:

On this page, I see something interesting:

Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn't affect the algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical program finishes.

So what does it exactly mean?

Does it mean using string as the key is always faster?

If yes, why?

Update:

Thanks for the suggestions about optimization! But I'm actually more interested in the plain truth, than whether or when we should do optimization.

Update 2:

Thanks for the great answers, I'll cite the content from the link provided by @DaveWebb here:

" ...

ma_lookup is initially set to the lookdict_string function (renamed to lookdict_unicode in 3.0), which assumes that both the keys in the dictionary and the key being searched for are standard PyStringObject's. It is then able to make a couple of optimiziations, such as mitigating various error checks, since string-to-string comparison never raise exceptions. There is also no need for rich object comparisons either, which means we avoid calling PyObject_RichCompareBool, and always use _PyString_Eq directly.

... "

Also, for the experiment numbers, I think the size of the difference will be even bigger if there is no int-to-string conversion

556

asked Jun 22 '12 18:06

xvatar

2 Answers

The C code that underlies the Python dict is optimisted for String keys. You can read about this here (and in the book the blog refers to).

If the Python runtime knows your dict only contains string keys it can do things such as not cater for errors that won't happen with a string to string comparison and ignore the rich comparison operators. This will make the common case of the string key only dict a little faster. (Update: timing shows it to be more than a little.)

However, it is unlikely that this would make a significant change to the run time of most Python programs. Only worry about this optimisation if you have measured and found dict lookups to be a bottleneck in your code. As the famous quote says, "Premature optimization is the root of all evil."

The only way to see how much faster things really are, is to time them:

>>> timeit.timeit('a["500"]','a ={}\nfor i in range(1000): a[str(i)] = i') 0.06659698486328125 >>> timeit.timeit('a[500]','a ={}\nfor i in range(1000): a[i] = i') 0.09005999565124512

So using string keys is about 30% faster even compared to int keys, and I have to admit I was surprised at the size of the difference.

answered Sep 28 '22 23:09

Dave Webb

As this only affects the constant time, it's likely not to matter at all. The only time you really need to optimise is when you are working with very large data sets - which this does nothing to affect.

What this does mean is that in the cases where you have small dictionaries with strings as keys, Python will be quick - this is a common usage, so it's been optimised for.

As Ignacio Vazquez-Abrams points out, it's likely that converting your key to a string will cost (far) more than the slight boost you might gain from it being a string for the dict.

In short use what is relevant to your situation - optimisation should only be done where there is a need for it, not before.

Some tests:

python -m timeit -s "a={key: 1 for key in range(1000)}" "a[500]" 10000000 loops, best of 3: 0.0773 usec per loop  python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[\"500\"]" 10000000 loops, best of 3: 0.0452 usec per loop  python -m timeit -s "a={str(key): 1 for key in range(1000)}" "a[str(500)]" 1000000 loops, best of 3: 0.244 usec per loop

As you can see, while the string-based dict is faster, converting the key is very expensive by comparison, totally mitigating the gain (and then some).

So yes, if the data you are using is only being used as keys to the dictionary, and what format your store them in doesn't matter, then strings are preferable, in a small dictionary. In practice, that is a very rare case (and you'd probably be using strings already).

answered Sep 28 '22 23:09

Gareth Latty

Related questions
                            
                                API to get android google play reviews(Getting device name and app version)
                            
                                How to setup django-hstore with an existing app managed by south?
                            
                                Adding a ProjectReference to a project that is not in the same solution
                            
                                Which operations must not throw an exception in C++?
                            
                                Settings up xdebug on MAMP PRO
                            
                                Autotools check for C++11
                            
                                Bash tab completion adds extra space after the first completion
                            
                                reCaptcha show "input error: invalid referer"
                            
                                Jquery Parallax Scrolling effect - Multi directional
                            
                                The Report Viewer Web Control HTTP Handler has not been registered in the application's web.config file
                            
                                How to get absolute route url as String In Play framework 2?
                            
                                Shutdown ExecutorService gracefully in webapp?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With