Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

memory leak calling cython function with large numpy array parameters?

I'm trying to write the python code that calls the following cython function test1 like this:

def test1( np.ndarray[np.int32_t, ndim=2] ndk, 
           np.ndarray[np.int32_t, ndim=2] nkw, 
           np.ndarray[np.float64_t, ndim=2] phi):

    for _ in xrange(int(1e5)):
        test2(ndk, nkw, phi)


cdef int test2(np.ndarray[np.int32_t, ndim=2] ndk,
               np.ndarray[np.int32_t, ndim=2] nkw,
               np.ndarray[np.float64_t, ndim=2] phi):
    return 1

my pure python code will call test1 and pass 3 numpy arrays as parameters, and they are very large (about 10^4*10^3). The test1 will in turn call the test2 which is defined with cdef keywords and pass those arrays. Since the test1 need to call test2 many times (about 10^5) before it returns, and test2 need not to be called outside the cython code, I use cdef instead of def.

But the problem is, every time the test1 calls test2, the memory starts to increase steadily. I've tried to call gc.collect() outside this cython code, but it doesn't work. And finally, the program will be killed by the system, for it has eaten up all the memories. I noticed that this problem only occurs with cdef and cpdef function, and if I change it into def it works fine.

I think the test1 is supposed to pass the references of these arrays to test2 in stead of object. But it seems as if it creates new objects of these arrays and pass them to test2, and these objects are never touched by the python gc afterwards.

did I miss something?

like image 957
Yang Yuan Avatar asked Mar 25 '15 04:03

Yang Yuan


2 Answers

I'm still confused about this problem. But I found another way to bypass this problem. Just explicitly tell the cython to pass the pointer like this :

def test1( np.ndarray[np.int32_t, ndim=2] ndk, 
           np.ndarray[np.int32_t, ndim=2] nkw, 
           np.ndarray[np.float64_t, ndim=2] phi):

for _ in xrange(int(1e5)):
    test2(&ndk[0,0], &nkw[0,0], &phi[0,0])


cdef int test2(np.int32_t* ndk,
               np.int32_t* nkw,
               np.float64_t* phi):
    return 1

However, you will need to index the array like this: ndk[i*row_len + j] Details:https://github.com/cython/cython/wiki/tutorials-NumpyPointerToC

like image 142
Yang Yuan Avatar answered Nov 04 '22 06:11

Yang Yuan


I've had a similar issue, and have solved it using memory views. As a side bonus to solving the leak, this method is also much simpler to use compared to pointers:

Typed memoryviews allow efficient access to memory buffers, such as those underlying NumPy arrays, without incurring any Python overhead. Memoryviews are similar to the current NumPy array buffer support (np.ndarray[np.float64_t, ndim=2]), but they have more features and cleaner syntax.

Unfortunately, I could not figure out why the former method is causing a memory leak - I can only guess that a pointer to the data stays alive somewhere and prevents the data from being garbage collected. Maybe someone can comment on this with a better insight.

At any rate, your code should work fine with this interface (example for function 'test2', but also will work for 'test1'):

cdef int test2(int[:,:] ndk, 
               int[:,:] nkw, 
               float[:,:] phi):

    # can access data using the referenced memory space, as if it's a regular numpy array 
    # (including properties such as .shape etc. - i.e.:
    # cdef int some_int = ndk[0, 5] <--- return the primitive value stored in [0,5] 
    # ndk.shape <--- will return the shape of the array.

    # NOTE: the original array (i.e. ndk which is passed into the function) should 
    # be an "exportable" object, and is presumably created by the caller 
    # (a python/Cython/Numpy array is such an exportable object)

    return 1
like image 43
shoomoo Avatar answered Nov 04 '22 07:11

shoomoo