Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python dictionary iterator performance

When working with dictionaries in Python, this page says that the time complexity of iterating through the element of the dictionary is O(n), where n is the largest size the dictionary has been.

However, I don't think that there is an obvious way to iterate through the elements of a hash table. Can I assume good performance of dict.iteritems() when iterating through element of a hash table, without too much overhead?

Since dictionaries are used a lot in Python, I assume that this is implemented in some smart way. Still, I need to make sure.

like image 827
Koen Avatar asked Jul 03 '15 21:07

Koen


1 Answers

If you look at the notes on Python's dictionary source code, I think the relevant points are the following:

Those methods (iteration and key listing) loop over every potential entry

How many potential entries will there be, as a function of largest size (largest number of keys ever stored in that dictionary)? Look at the following two sections in the same document:

Maximum dictionary load in PyDict_SetItem. Currently set to 2/3

Growth rate upon hitting maximum load. Currently set to *2.

This would suggest that the sparsity of a dictionary is going to be somewhere around 1/3~2/3 (unless growth rate is set to *4, then it's 1/6~2/3). So basically you're going to be checking upto 3 (or 6 if *4) potential entries for every key.

Of course, whether it's 3 entries or 1000, it's still O(n) but 3 seems like a pretty acceptable constant factor.

Incidentally here are the rest of the source & documentation, including that of the DictObject:

http://svn.python.org/projects/python/trunk/Objects/

like image 128
zehnpaard Avatar answered Oct 14 '22 02:10

zehnpaard