Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create PyString from c character array without copying

Tags:

python

c

I have a large buffer of strings (basically 12GB) from a C app.

I would like to create PyString objects in C for an embedded Python interpreter without copying the strings. Is this possible?

like image 388
aterrel Avatar asked Jul 31 '14 19:07

aterrel


2 Answers

I don't think that is possible for the basic reason that Python String objects are embedded into the PyObject structure. In other words, the Python string object is the PyObject_HEAD followed by the bytes of the string. You would have to have room in memory to put the PyObject_HEAD information around the existing bytes.

like image 104
Travis Oliphant Avatar answered Oct 05 '22 13:10

Travis Oliphant


One can't use PyString without a copy, but one can use ctypes. Turns out that ctypes.c_char_p works basically like a string. For example with the following C code:

static char* names[7] = {"a", "b", "c", "d", "e", "f", "g"};                                      
PyObject *pFunc, *pArgs, *pValue;
pFunc = td_py_get_callable("my_func");
pArgs = PyTuple_New(2);
pValue = PyLong_FromSize_t((size_t) names);
PyTuple_SetItem(pArgs, 0, pValue);
pValue = PyLong_FromLong(7);
PyTuple_SetItem(pArgs, 1, pValue);
pValue = PyObject_CallObject(pFunc, pArgs);

One can then pass the address and the number of character strings With the following python my_func:

def my_func(names_addr, num_strs):
    type_char_p = ctypes.POINTER(ctypes.c_char_p)
    names = type_char_p.from_address(names_addr)
    for idx in range(num_strs):
        print(names[idx])

Of course who really wants to pass around a address and a length in Python. We can put these in a numpy array and pass around then cast if we need to use them:

def my_func(name_addr, num_strs):
    type_char_p = ctypes.POINTER(ctypes.c_char_p)
    names = type_char_p.from_address(names_addr)
    // Cast to size_t pointers to be held by numpy
    p = ctypes.cast(names, ctypes.POINTER(ctypes.c_size_t))
    name_addrs = numpy.ctypeslib.as_array(p, shape=(num_strs,))
    // pass to some numpy functions
    my_numpy_fun(name_addrs)

The challenge is that evaluating the indices of numpy arrays is only going to give you an address, but the memory is the same as the original c pointer. We can cast back to a ctypes.POINTER(ctypes.c_char_p) to access values:

def my_numpy_func(name_addrs):
    names = name_addrs.ctypes.data_as(ctypes.POINTER(ctypes.c_char_p))
    for i in range(len(name_addrs)):
        print names[i]

It's not perfect as I can't use things like numpy.searchsorted to do a binary search at the numpy level, but it does pass around char* without a copy well enough.

like image 39
aterrel Avatar answered Oct 05 '22 15:10

aterrel