Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python dict set min_size

I'm parsing hundreds of millions of JSON records and storing the relevant components from each in a dict. The problem is that because of the number of records I'm processing, python is being forced to increase the size of the dict's underlying hash table several times. This results in a LOT of data having to be rehashed. The sheer amount of rehashing itself seems to cost a lot of time. Therefore, I wonder if there's a way to set a minimum size on the dict's underlying hash table so that the number of resizing operations is minimized.

I have read this on optimizing python's dict, from an answer on this question, but cannot find how to change the initial size of a dict's hash table. If anyone can help me out with this, I'd be very grateful.

Thank you

like image 623
inspectorG4dget Avatar asked Nov 04 '22 20:11

inspectorG4dget


1 Answers

If you do this:

a = dict.fromkeys(range(n))

it will force the dictionary size to accomodate n items. It is quite quick after that, but it takes 3s to do so.

like image 108
Dominic Bou-Samra Avatar answered Nov 15 '22 06:11

Dominic Bou-Samra