I'm parsing some xml (with some python 3.4 code) and want to retrieve both the text from a node and its id attribute. Example:
<li id="12345"> Some text here </li>
My current code is structured around the text only (I'm now adding the id, but didn't need this before). I'm looping through a list of text/sentences, and then proceed to do some stuff. So I thought of making a dictionary with the text/sentence as key, and this id attribute as value.
However, this doesn't feel very efficient. The text can be a whole paragraph, making the key very long. Whereas the id is always of a fairly limited length (but still of type str though, e.g. some alpha characters followed by some digits). But making the ids the key and the text the value requires some rewriting of the code. All not very problematic, but this just got me wondering: How inefficient would it be to have the text (potentially a whole paragraph) as key, compared to an id like "ulp_887362487687678" as key?
I can just make two reverse dictionaries (one with id as key, the other with text as key) and compare construction and lookup and all. And I've also found some topics on key length limit (Do Dictionaries have a key length limit?). But I'm merely wondering what your thoughts are on this. Is having such long str keys in your dict something that you definitely want to avoid, or is it not a very big deal? If you could share some pro's/con's, that would be great!
There is no such limit in place regarding dictionary keys. Since python also has arbitrary precision on numeric types, the only limit you will encounter, string or otherwise, is that of available memory.
The fastest way to repeatedly lookup data with millions of entries in Python is using dictionaries. Because dictionaries are the built-in mapping type in Python thereby they are highly optimized.
The (amortized) time complexity is constant (O(1)) in the size of the dictionary.
By using max() and dict. get() method we can easily get the Key with maximum value in a dictionary. To obtain the maximum value from the dictionary we can use the in-built max() function. In this example, we can use iterable and dict to get the key paired with the maximum value.
No, Python string length hardly has an impact on dictionary performance. The only influence the string length could have is on the hash()
function used map the key to a hash table slot.
String length has very little impact on the performance of hash()
:
>>> import random
>>> from timeit import timeit
>>> from string import ascii_letters
>>> generate_text = lambda len: ''.join([random.choice(ascii_letters) for _ in xrange(len)])
>>> for i in range(8):
... length = 10 + 10 ** i
... testword = generate_text(length)
... timing = timeit('hash(t)', 'from __main__ import testword as t')
... print 'Length: {}, timing: {}'.format(length, timing)
...
Length: 11, timing: 0.061537027359
Length: 20, timing: 0.0796310901642
Length: 110, timing: 0.0631730556488
Length: 1010, timing: 0.0606122016907
Length: 10010, timing: 0.0613977909088
Length: 100010, timing: 0.0607581138611
Length: 1000010, timing: 0.0672461986542
Length: 10000010, timing: 0.080118894577
I stopped at generating a string of 10 million characters, because I couldn't be bothered waiting for my laptop to generate a 100 million character string.
The timings are pretty much constant, because the value is actually cached on the string object once computed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With