Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

arrays of arrays in cython

How can one declare an array of arrays in cython?

More precisely, I want to construct (declare and then initialize) an m by n matrix, call it A, in which each entry [i,j] is a 1-dimensional array of doubles (of length min(i,j), filled with zeros) of the form

cdef np.ndarray[np.double_t, ndim=1] A[i,j]
A[i,j] = np.zeros((min(i,j)), dtype=np.double)

For (m,n)=(4,3), print A should return something like this:

[[[], [], []],
[[], [0.], [0.]],
[[], [0.], [0.,0.]],
[[], [0.], [0.,0.]]]

How do I declare and initialize A?

like image 602
danb Avatar asked Sep 27 '13 15:09

danb


People also ask

Should I use NumPy in Cython?

You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays. Let's see how this works with a simple example.


1 Answers

The object method:

import numpy

def thing(int m, int n):
    cdef int i, j
    cdef object[:, :] A = numpy.empty((m, n), dtype=object)

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[i, j] = numpy.zeros(min(i, j))

    return A

Note that that object[:, :] syntax is the newer version, the numpy.ndarray[object, ndim=2] version is deprecated. The newer version is GIL-free (well, probably not when using object types), normally faster (never slower), type-agnostic (works on anything supporting memoryview) and cleaner.

If you want to iterate over the sub-arrays, you'd be doing:

for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        subarray = A[i, j]
        for k in range(subarray.size):
            ...

and you can either type subarray to object (best for small subarrays) or float[:] (best for large subarrays).


The C-level solution is proving to be very tricky. I have a feeling you'll basically end up writing it with pure-C types.

So I'm abandoning that, and here's what I'd do:

import numpy

def thing(int m, int n):
    cdef int i, j

    cdef float[:, :, :] A = numpy.zeros((m, n, min(m, n)), dtype=float)
    cdef int[:, :] A_lengths = numpy.empty((m, n), dtype=int)

    for i in range(A_lengths.shape[0]):
        for j in range(A_lengths.shape[1]):
            A_lengths[i, j] = min(i, j)

    return A, A_lengths

Basically, make a 3D array and a 2D array of corresponding lengths. If there is only a linear variation in lengths (so the max length is a reasonable factor [I'd say up to about 10] of the mean length) then this should have acceptable overhead. It'll allow pure-C calculations while having a tasty memoryview interface.


That's all I've got. Take it or leave it.

like image 72
Veedrac Avatar answered Oct 05 '22 11:10

Veedrac