I am trying to write a Cython extension to CPython to wrap the mcrypt library, so that I can use it with Python 3. However, I am running into a problem where I segfault while trying to use one of the mcrypt APIs.
The code that is failing is:
def _real_encrypt(self, source):
src_len = len(source)
cdef char* ciphertext = source
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
retval = source[:src_len]
return retval
Now, the way I understand the Cython documentation, the assignment on line 3 should copy the contents of the buffer (a object in Python 3) to the C string pointer. I would figure that this would also mean that it would allocate the memory, but when I made this modification:
def _real_encrypt(self, source):
src_len = len(source)
cdef char* ciphertext = <char *>malloc(src_len)
ciphertext = source
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext, src_len)
retval = source[:src_len]
return retval
it still crashed with a segfault. It's crashing inside of mcrypt_generic, but when I use plain C code I am able to make it work just fine, so there has to be something that I am not quite understanding about how Cython is working with C data here.
Thanks for any help!
ETA: The problem was a bug on my part. I was working on this after being awake for far too many hours (isn't that something we've all done at some point?) and missed something stupid. The code that I now have, which works, is:
def _real_encrypt(self, source):
src_len = len(source)
cdef char *ciphertext = <char *>malloc(src_len)
cmc.strncpy(ciphertext, source, src_len)
cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
len(self._key), NULL)
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
src_len)
retval = ciphertext[:src_len]
cmc.mcrypt_generic_deinit(self._mcStream)
return retval
It's probably not the most efficient code in the world, as it makes a copy to do the encryption and then a second copy to the return value. I'm not sure if it is possible to avoid that, though, since I'm not sure if it is possible to take a newly-allocated buffer and return it to Python in-place as a bytestring. But now that I have a working function, I'm going to implement a block-by-block method as well, so that one can provide an iterable of blocks for encryption or decryption, and be able to do it without having the entire source and destination all in memory all at once---that way, it'd be possible to encrypt/decrypt huge files without having to worry about holding up to three copies of it in memory at any one point...
Thanks for the help, everyone!
The first one is pointing the char*
at the Python string. The second allocates memory, but then re-points the pointer to the Python string and ignores the newly allocated memory. You should be invoking the C library function strcpy
from Cython, presumably; but I don't know the details.
A few comments on your code to help improve it, IMHO. There are functions provided by the python C API that do exactly what you need to do, and make sure everything conforms to the Python way of doing things. It will handle embedded NULL's without a problem.
Rather than calling malloc
directly, change this:
cdef char *ciphertext = <char *>malloc(src_len)
to
cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
cdef char *ciphertext = PyString_AsString(retval)
The above lines will create a brand new Python str object initialized to the contents of source
. The second line points ciphertext
to retval
's internal char *
buffer without copying. Whatever modifies ciphertext
will modify retval
. Since retval
is a brand new Python str, it can be modified by C code before being returned from _real_encrypt
.
See the Python C/API docs on the above functions for more details, here and here.
The net effect saves you a copy. The whole code would be something like:
cdef extern from "Python.h":
object PyString_FromStringAndSize(char *, Py_ssize_t)
char *PyString_AsString(object)
def _real_encrypt(self, source):
src_len = len(source)
cdef str retval = PyString_FromStringAndSize(PyString_AsString(source), <Py_ssize_t>src_len)
cdef char *ciphertext = PyString_AsString(retval)
cmc.mcrypt_generic_init(self._mcStream, <void *>self._key,
len(self._key), NULL)
cmc.mcrypt_generic(self._mcStream, <void *>ciphertext,
src_len)
# since the above initialized ciphertext, the retval str is also correctly initialized, too.
cmc.mcrypt_generic_deinit(self._mcStream)
return retval
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With