Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is memory handled for np.ndarray in cython?

For example if I do this:

cdef np.ndarray[np.int64_t, ndim=1] my_array

Where is my my_array stored? I would think that since I didn't tell cython to store in on the heap it would be stored on the stack, but after doing the following experiment it seems that it is stored on the heap, or somehow efficiently memory managed. How is memory managed with respect to my_array? Maybe I am missing something obvious, but I couldn't find any documentation on it.

import numpy as np
cimport cython
cimport numpy as np

from libc.stdlib cimport malloc, free

def big_sum():
    # freezes up:
    # "a" is created on the stack
    # space on the stack is limited, so it runs out

    cdef int a[10000000]

    for i in range(10000000):
        a[i] = i

    cdef int my_sum
    my_sum = 0
    for i in range(10000000):
        my_sum += a[i]
    return my_sum

def big_sum_malloc():
    # runs fine:
    # "a" is stored on the heap, no problem

    cdef int *a
    a = <int *>malloc(10000000*cython.sizeof(int))

    for i in range(10000000):
        a[i] = i

    cdef int my_sum
    my_sum = 0
    for i in range(10000000):
        my_sum += a[i]

    with nogil:
        free(a) 
    return my_sum    

def big_numpy_array_sum():
    # runs fine:
    # I don't know what is going on here
    # but given that the following code runs fine,
    # it seems that entire array is NOT stored on the stack

    cdef np.ndarray[np.int64_t, ndim=1] my_array
    my_array = np.zeros(10000000, dtype=np.int64)

    for i in range(10000000):
        my_array[i] = i

    cdef int my_sum
    my_sum = 0
    for i in range(10000000):
        my_sum += my_array[i]
    return my_sum
like image 696
Akavall Avatar asked Nov 15 '13 15:11

Akavall


People also ask

Does NumPy work in Cython?

You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.

Does Cython speed up NumPy?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

Is Cython garbage collected?

They are full featured, garbage collected and much easier to work with than bare pointers in C, while still retaining the speed and static typing benefits.

How much faster is Cython than Python?

The CPython + Cython implementation is the fastest; it is 44 times faster than the CPython implementation. This is an impressive speed improvement, especially considering that the Cython code is very close to the original Python code in its design.


1 Answers

Cython is not doing anything magical here. Numpy has a full C-api, and that's what cython is interacting with -- cython is not performing the memory management itself, and memory in the numpy array is handled the same way it is when using a numpy array from python. @Bakuriu is right -- this is definitely on the heap.

Consider this cython code:

cimport numpy as np
def main():
    zeros = np.zeros
    cdef np.ndarray[dtype=np.double_t, ndim=1] array
    array = zeros(10000)

This gets translated to the following C in equivalent main function. I've removed the declarations and error handling code to make it cleaner to read.

PyArrayObject *__pyx_v_array = 0;
PyObject *__pyx_v_zeros = NULL;
PyObject *__pyx_t_1 = NULL;
PyObject *__pyx_t_2 = NULL;

// zeros = np.zeros             # <<<<<<<<<<<<<<
// get the numpy module object
__pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s__np);
// get the "zeros" function
__pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s__zeros)
__pyx_v_zeros = __pyx_t_2;

// array = zeros(10000)             # <<<<<<<<<<<<<<
// (__pyx_k_tuple_1 is a static global variable containing the literal python tuple
// (10000, ) that was initialized during the __Pyx_InitCachedConstants function)
__pyx_t_2 = PyObject_Call(__pyx_v_zeros, ((PyObject *)__pyx_k_tuple_1), NULL);
__pyx_v_array = ((PyArrayObject *)__pyx_t_2);

If you look up the numpy C api documentation, you'll see that PyArrayObject is the numpy ndarray's C-api struct. The key point here is to see that cython isn't explicitly handling memory allocation at all. The same object orientated design principles apply with the python and numpy C apis, and memory management here is the responsibility of PyArrayObject. The situation no different from the use of a numpy array in python.

like image 136
Robert T. McGibbon Avatar answered Oct 23 '22 19:10

Robert T. McGibbon