Using Cython, I am trying to convert a Python list to a Cython array, and vice versa. The Python list contains numbers from the range 0 - 255, so I specify the type of the array as an unsigned char
array. Here is my code to do the conversions:
from libc.stdlib cimport malloc
cdef to_array(list pylist):
cdef unsigned char *array
array = <unsigned char *>malloc(len(pylist) * sizeof(unsigned char))
cdef long count = 0
for item in pylist:
array[count] = item
count += 1
return array
cdef to_list(array):
pylist = [item for item in array]
return pylist
def donothing(pylist):
return to_list(to_array(pylist))
The problem lies in the fact that pieces of garbage data are generated in the Cython array, and when converted to Python lists, the garbage data carries over. For example, donothing
should do absolutely nothing, and return the python list back to me, unchanged. This function is simply for testing the conversion, but when I run it I get something like:
In[56]: donothing([2,3,4,5])
Out[56]: [2, 3, 4, 5, 128, 28, 184, 6, 161, 148, 185, 69, 106, 101]
Where is this data coming from in the code, and how can this garbage be cleaned up so no memory is wasted?
P.S. There may be a better version of taking numbers from a Python list and injecting them into an unsigned char
array. If so, please direct me to a better method entirely.
Your to_array
has an untyped return value. Further, you assign the result to an untyped value. As such, Cython is forced to convert char *
to a Python type.
Cython converts to bytes
, because char
is approximately bytes
. Unfortunately, without an explicitly-given length Cython assumes that the char *
is null-terminated. This is what causes the problem:
convert_lists.donothing([1, 2, 3, 0, 4, 5, 6])
#>>> [1, 2, 3]
When there are no zeroes, Cython will just read until it finds one, going past actually-allocated memory.
You can't actually do for x in my_pointer_arrray
for arbitrary Cython types. The for
loop actually operates on incorrectly-converted bytes
.
You can fix this by typing all values that will hold the char
array, passing around the length explicitly and looping over ranges (which will also be faster when the loop variable is typed), or by using a wrapper of some sort. For ideas on what wrapper arrays to use, this question and answer pair has you covered.
Please also note that you should be very careful about errors when using manual allocation. malloc
'd data is not garbage collected, so if you error out of a code-path you're going to leak memory. You should check how to handle each specific case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With