In Python I have a list of lists as input:
input = [[0,1,2],[0,3,4,5],[0,6]]
In reality, the number of sub-lists is tens of thousands. The length of each sub-list could vary greatly, from zero or one value to many hundreds.
I want to pass the input data as some 2D structure to a Cython module that will process it. I wish to process the data on multiple cores, thus I use prange
with nogil=True
:
from cython.parallel import prange
cpdef int my_func(long[:,:] arr):
cdef int i,j
for i in prange(arr.shape[0], nogil=True):
for j in range(arr.shape[1]):
# Do something
pass
return 42
I see the following solutions:
my_func
to accept a list of lists. The problem is that part of the code is executed without the GIL and therefore not able to access python objects.Does anyone have suggestions, preferably with code, on how to solve this problem?
I would probably go for a flattened array, where the starts of the single lists are stored in an auxiliary array, not unsimilar to csr-matrices.
Here is an example how this data structure could be constructed from list of lists (using numpy, but you could also use array.array
; it is also not really optimized for speed), just to give you the idea:
import numpy as np
def flatten_list_of_lists(lst_of_lsts):
N = sum(map(len, lst_of_lsts)) # number of elements in the flattened array
starts = np.empty(len(lst_of_lsts)+1, dtype=np.uint64) # needs place for one sentinel
values = np.empty(N, dtype=np.int64)
starts[0], cnt = 0, 0
for i,lst in enumerate(lst_of_lsts):
for el in lst:
values[cnt] = el
cnt += 1 # update index in the flattened array for the next element
starts[i+1] = cnt # remember the start of the next list
return starts, values
So for your example produce the following result:
# starts values
(array([0, 3, 7, 9], dtype=uint64), array([0, 1, 2, 0, 3, 4, 5, 0, 6]))
You can see: there are 3 sublists starting at 0
, 3
and 7
and being 3
(difference starts[1]-starts[0]
), 4
and 2
elements long.
And this is how this data could be consumed:
%%cython
from cython.parallel import prange
cpdef int my_func(unsigned long long[::1] starts, long long[::1] values):
cdef int i,j
for i in prange(len(starts)-1, nogil=True):
for j in range(starts[i], starts[i+1]):
# Do something with values[j]
pass
return 42
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With