As an optimization for handling a dict which will hold tens or hundreds of millions of keys, I'd really, really like to pre-size its capacity... but there seems no Pythonic way to do so.
Is it practical to use Cython or C callouts to directly call CPython's internal functions, such as dictresize() or _PyDict__NewPresized(), to achieve this?
It depends on what you mean by practical. It's certainly straightforward enough; you can just call _PyDict_NewPresized(howevermany)
. Heck, you can even do it from Python:
>>> import ctypes
>>> import sys
>>> ctypes.pythonapi._PyDict_NewPresized.restype = ctypes.py_object
>>> d = ctypes.pythonapi._PyDict_NewPresized(100)
>>> sys.getsizeof(d)
1676
>>> sys.getsizeof({})
140
>>> len(d)
0
As you can see, the dict is presized, but it has no elements. Whether depending on CPython implementation details like this is practical is up to you.
After a night of hacking, I came up with the following solution which does not rely on any module. It allows you to initialize a dict with room for any number of elements up to 2**31-1 (=2,147,483,647).
def bigdict(size):
bytecode = '\x91%c%ci%c%cS'%((size>>16)&0xff,(size>>24)&0xff,size&0xff,(size>>8)&0xff)
return eval(bigdict.func_code.__class__( 0, 0, 1, 64, bytecode, (), (), (), "317070", '<module>', 1, '', (), ()))
As an illustration:
In [95]: print sys.getsizeof({})
280
In [96]: print sys.getsizeof(bigdict(0))
280
In [97]: print sys.getsizeof(bigdict(1))
280
In [98]: print sys.getsizeof(bigdict(100))
3352
In [99]: print sys.getsizeof(bigdict(2**29-1))
12884902168
In [100]: print bigdict(2**29-1)
{}
That is the slowest empty dict I've ever seen. That last command took ages to complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With