Can a callout to C presize a Python dict's capacity?

Question

As an optimization for handling a dict which will hold tens or hundreds of millions of keys, I'd really, really like to pre-size its capacity... but there seems no Pythonic way to do so.

Is it practical to use Cython or C callouts to directly call CPython's internal functions, such as dictresize() or _PyDict__NewPresized(), to achieve this?

user2357112 supports Monica · Accepted Answer

It depends on what you mean by practical. It's certainly straightforward enough; you can just call _PyDict_NewPresized(howevermany). Heck, you can even do it from Python:

>>> import ctypes
>>> import sys
>>> ctypes.pythonapi._PyDict_NewPresized.restype = ctypes.py_object
>>> d = ctypes.pythonapi._PyDict_NewPresized(100)
>>> sys.getsizeof(d)
1676
>>> sys.getsizeof({})
140
>>> len(d)
0

As you can see, the dict is presized, but it has no elements. Whether depending on CPython implementation details like this is practical is up to you.

317070 · Answer

After a night of hacking, I came up with the following solution which does not rely on any module. It allows you to initialize a dict with room for any number of elements up to 2**31-1 (=2,147,483,647).

def bigdict(size):
    bytecode = '\x91%c%ci%c%cS'%((size>>16)&0xff,(size>>24)&0xff,size&0xff,(size>>8)&0xff)
    return eval(bigdict.func_code.__class__( 0, 0, 1, 64, bytecode, (), (), (), "317070", '<module>', 1, '', (), ()))

As an illustration:

In [95]: print sys.getsizeof({})
280

In [96]: print sys.getsizeof(bigdict(0))
280

In [97]: print sys.getsizeof(bigdict(1))
280

In [98]: print sys.getsizeof(bigdict(100))
3352

In [99]: print sys.getsizeof(bigdict(2**29-1))
12884902168

In [100]: print bigdict(2**29-1)
{}

That is the slowest empty dict I've ever seen. That last command took ages to complete.

Can a callout to C presize a Python dict's capacity?

Tags:

python

dictionary

cpython

gojomo

2 Answers

user2357112 supports Monica

317070

Recent Activity

Donate For Us

Can a callout to C presize a Python dict's capacity?

Tags:

python

dictionary

cpython

gojomo

2 Answers

user2357112 supports Monica

317070

Related questions

Recent Activity

Donate For Us