Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can a callout to C presize a Python dict's capacity?

As an optimization for handling a dict which will hold tens or hundreds of millions of keys, I'd really, really like to pre-size its capacity... but there seems no Pythonic way to do so.

Is it practical to use Cython or C callouts to directly call CPython's internal functions, such as dictresize() or _PyDict__NewPresized(), to achieve this?

like image 326
gojomo Avatar asked May 29 '15 01:05

gojomo


2 Answers

It depends on what you mean by practical. It's certainly straightforward enough; you can just call _PyDict_NewPresized(howevermany). Heck, you can even do it from Python:

>>> import ctypes
>>> import sys
>>> ctypes.pythonapi._PyDict_NewPresized.restype = ctypes.py_object
>>> d = ctypes.pythonapi._PyDict_NewPresized(100)
>>> sys.getsizeof(d)
1676
>>> sys.getsizeof({})
140
>>> len(d)
0

As you can see, the dict is presized, but it has no elements. Whether depending on CPython implementation details like this is practical is up to you.

like image 135
user2357112 supports Monica Avatar answered Oct 30 '22 01:10

user2357112 supports Monica


After a night of hacking, I came up with the following solution which does not rely on any module. It allows you to initialize a dict with room for any number of elements up to 2**31-1 (=2,147,483,647).

def bigdict(size):
    bytecode = '\x91%c%ci%c%cS'%((size>>16)&0xff,(size>>24)&0xff,size&0xff,(size>>8)&0xff)
    return eval(bigdict.func_code.__class__( 0, 0, 1, 64, bytecode, (), (), (), "317070", '<module>', 1, '', (), ()))

As an illustration:

In [95]: print sys.getsizeof({})
280

In [96]: print sys.getsizeof(bigdict(0))
280

In [97]: print sys.getsizeof(bigdict(1))
280

In [98]: print sys.getsizeof(bigdict(100))
3352

In [99]: print sys.getsizeof(bigdict(2**29-1))
12884902168

In [100]: print bigdict(2**29-1)
{}

That is the slowest empty dict I've ever seen. That last command took ages to complete.

like image 43
317070 Avatar answered Oct 30 '22 02:10

317070