Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterating over a list in parallel with Cython

How does one iterate in parallel over a (Python) list in Cython?

Consider the following simple function:

def sumList():
    cdef int n = 1000
    cdef int sum = 0

    ls = [i for i in range(n)]

    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        sum += ls[i]

    return sum

This gives a lot of compiler errors, because a parallel section without the GIL apparently cannot work with any Python object:

Error compiling Cython file:
------------------------------------------------------------
...

    ls = [i for i in range(n)]

    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        sum += ls[i]
     ^
------------------------------------------------------------

src/parallel.pyx:42:6: Coercion from Python not allowed without the GIL

Error compiling Cython file:
------------------------------------------------------------
...

    ls = [i for i in range(n)]

    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        sum += ls[i]
     ^
------------------------------------------------------------

src/parallel.pyx:42:6: Operation not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...

    ls = [i for i in range(n)]

    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        sum += ls[i]
     ^
------------------------------------------------------------

src/parallel.pyx:42:6: Converting to Python object not allowed without gil

Error compiling Cython file:
------------------------------------------------------------
...

    ls = [i for i in range(n)]

    cdef Py_ssize_t i
    for i in prange(n, nogil=True):
        sum += ls[i]
          ^
------------------------------------------------------------

src/parallel.pyx:42:11: Indexing Python object not allowed without gil
like image 569
clstaudt Avatar asked Jul 23 '13 13:07

clstaudt


2 Answers

I am not aware of any way to do this. A list is a Python object, so using its __getitem__ method requires the GIL. If you are able to use a NumPy array in this case, it will work. For example, if you wanted to iterate over an array A of double precision floating point values you could do something like this:

cimport cython
from numpy cimport ndarray as ar
from cython.parallel import prange
@cython.boundscheck(False)
@cython.wraparound(False)
cpdef cysumpar(ar[double] A):
    cdef double tot=0.
    cdef int i, n=A.size
    for i in prange(n, nogil=True):
        tot += A[i]
    return tot

On my machine, for this particular case, prange doesn't make it any faster than a normal loop, but it could work better in other cases. For more on how to use prange see the documentation at http://docs.cython.org/src/userguide/parallelism.html

Whether or not you can use an array depends on how much you are changing the size of the array. If you need a lot of flexibility with the size, the array will not work. You could also try interfacing with the vector class in C++. I've never done that myself, but there is a brief description of how to do that here: http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html#nested-class-declarations

like image 165
IanH Avatar answered Oct 10 '22 14:10

IanH


Convert your list into an array if you need any numeric value, or a bytearray if values are limited between 0 and 255. If you store anything else than numeric values, try numpy or use dtypes directly. For example with bytes:

cdef int[::1] gen = array.array('i',[1, 2, 3, 4])

And if you want to use C types:

ctypedef unsigned char uint8_t

like image 22
gaborous Avatar answered Oct 10 '22 13:10

gaborous