I'm parsing hundreds of millions of JSON records and storing the relevant components from each in a dict
. The problem is that because of the number of records I'm processing, python is being forced to increase the size of the dict
's underlying hash table several times. This results in a LOT of data having to be rehashed. The sheer amount of rehashing itself seems to cost a lot of time. Therefore, I wonder if there's a way to set a minimum size on the dict
's underlying hash table so that the number of resizing operations is minimized.
I have read this on optimizing python's dict
, from an answer on this question, but cannot find how to change the initial size of a dict
's hash table. If anyone can help me out with this, I'd be very grateful.
Thank you
If you do this:
a = dict.fromkeys(range(n))
it will force the dictionary size to accomodate n items. It is quite quick after that, but it takes 3s to do so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With