Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cython bytes to C char*

I am trying to write a Cython extension to CPython to wrap the mcrypt library, so that I can use it with Python 3. However, I am running into a problem where I segfault while trying to use one of the mcrypt APIs.

The code that is failing is:

def _real_encrypt(self, source):
    src_len = len(source)
    cdef char* ciphertext = source
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
    retval = source[:src_len]
    return retval

Now, the way I understand the Cython documentation, the assignment on line 3 should copy the contents of the buffer (a object in Python 3) to the C string pointer. I would figure that this would also mean that it would allocate the memory, but when I made this modification:

def _real_encrypt(self, source):
    src_len = len(source)
    cdef char* ciphertext = <char *>malloc(src_len)
    ciphertext = source
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
    retval = source[:src_len]
    return retval

it still crashed with a segfault. It's crashing inside of mcrypt_generic, but when I use plain C code I am able to make it work just fine, so there has to be something that I am not quite understanding about how Cython is working with C data here.

Thanks for any help!

ETA: The problem was a bug on my part. I was working on this after being awake for far too many hours (isn't that something we've all done at some point?) and missed something stupid. The code that I now have, which works, is:

def _real_encrypt(self, source):
    src_len = len(source)
    cdef char *ciphertext = <char *>malloc(src_len)
    cmc.strncpy(ciphertext, source, src_len)
    cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
                            len(self._key), NULL)
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
                       src_len)

    retval = ciphertext[:src_len]
    cmc.mcrypt_generic_deinit(self._mcStream)
    return retval

It's probably not the most efficient code in the world, as it makes a copy to do the encryption and then a second copy to the return value. I'm not sure if it is possible to avoid that, though, since I'm not sure if it is possible to take a newly-allocated buffer and return it to Python in-place as a bytestring. But now that I have a working function, I'm going to implement a block-by-block method as well, so that one can provide an iterable of blocks for encryption or decryption, and be able to do it without having the entire source and destination all in memory all at once---that way, it'd be possible to encrypt/decrypt huge files without having to worry about holding up to three copies of it in memory at any one point...

Thanks for the help, everyone!

like image 528
Michael Trausch Avatar asked Dec 14 '10 07:12

Michael Trausch


2 Answers

The first one is pointing the char* at the Python string. The second allocates memory, but then re-points the pointer to the Python string and ignores the newly allocated memory. You should be invoking the C library function strcpy from Cython, presumably; but I don't know the details.

like image 175
Karl Knechtel Avatar answered Sep 28 '22 03:09

Karl Knechtel


A few comments on your code to help improve it, IMHO. There are functions provided by the python C API that do exactly what you need to do, and make sure everything conforms to the Python way of doing things. It will handle embedded NULL's without a problem.

Rather than calling malloc directly, change this:

cdef char *ciphertext = <char *>malloc(src_len)

to

cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
cdef char *ciphertext = PyString_AsString(retval)

The above lines will create a brand new Python str object initialized to the contents of source. The second line points ciphertext to retval's internal char * buffer without copying. Whatever modifies ciphertext will modify retval. Since retval is a brand new Python str, it can be modified by C code before being returned from _real_encrypt.

See the Python C/API docs on the above functions for more details, here and here.

The net effect saves you a copy. The whole code would be something like:

cdef extern from "Python.h":
    object PyString_FromStringAndSize(char *, Py_ssize_t)
    char *PyString_AsString(object)

def _real_encrypt(self, source):
    src_len = len(source)
    cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
    cdef char *ciphertext = PyString_AsString(retval)
    cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
                            len(self._key), NULL)
    cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
                       src_len)
    # since the above initialized ciphertext, the retval str is also correctly initialized, too.
    cmc.mcrypt_generic_deinit(self._mcStream)
    return retval
like image 44
lothario Avatar answered Sep 28 '22 05:09

lothario