Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extra elements in Python list

Tags:

python

cython

Using Cython, I am trying to convert a Python list to a Cython array, and vice versa. The Python list contains numbers from the range 0 - 255, so I specify the type of the array as an unsigned char array. Here is my code to do the conversions:

from libc.stdlib cimport malloc

cdef to_array(list pylist):
    cdef unsigned char *array 
    array = <unsigned char *>malloc(len(pylist) * sizeof(unsigned char))
    cdef long count = 0

    for item in pylist:
        array[count] = item
        count += 1
    return array

cdef to_list(array):
    pylist = [item for item in array]
    return pylist

def donothing(pylist):
    return to_list(to_array(pylist))

The problem lies in the fact that pieces of garbage data are generated in the Cython array, and when converted to Python lists, the garbage data carries over. For example, donothing should do absolutely nothing, and return the python list back to me, unchanged. This function is simply for testing the conversion, but when I run it I get something like:

In[56]:  donothing([2,3,4,5])
Out[56]: [2, 3, 4, 5, 128, 28, 184, 6, 161, 148, 185, 69, 106, 101]

Where is this data coming from in the code, and how can this garbage be cleaned up so no memory is wasted?

P.S. There may be a better version of taking numbers from a Python list and injecting them into an unsigned char array. If so, please direct me to a better method entirely.

like image 694
Nick Pandolfi Avatar asked Jun 08 '14 01:06

Nick Pandolfi


1 Answers

Your to_array has an untyped return value. Further, you assign the result to an untyped value. As such, Cython is forced to convert char * to a Python type.

Cython converts to bytes, because char is approximately bytes. Unfortunately, without an explicitly-given length Cython assumes that the char * is null-terminated. This is what causes the problem:

convert_lists.donothing([1, 2, 3, 0, 4, 5, 6])
#>>> [1, 2, 3]

When there are no zeroes, Cython will just read until it finds one, going past actually-allocated memory.

You can't actually do for x in my_pointer_arrray for arbitrary Cython types. The for loop actually operates on incorrectly-converted bytes.

You can fix this by typing all values that will hold the char array, passing around the length explicitly and looping over ranges (which will also be faster when the loop variable is typed), or by using a wrapper of some sort. For ideas on what wrapper arrays to use, this question and answer pair has you covered.


Please also note that you should be very careful about errors when using manual allocation. malloc'd data is not garbage collected, so if you error out of a code-path you're going to leak memory. You should check how to handle each specific case.

like image 61
Veedrac Avatar answered Oct 31 '22 18:10

Veedrac