How can one declare an array of arrays in cython?
More precisely, I want to construct (declare and then initialize) an m by n matrix, call it A, in which each entry [i,j] is a 1-dimensional array of doubles (of length min(i,j)
, filled with zeros) of the form
cdef np.ndarray[np.double_t, ndim=1] A[i,j]
A[i,j] = np.zeros((min(i,j)), dtype=np.double)
For (m,n)=(4,3), print A
should return something like this:
[[[], [], []],
[[], [0.], [0.]],
[[], [0.], [0.,0.]],
[[], [0.], [0.,0.]]]
How do I declare and initialize A?
You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays. Let's see how this works with a simple example.
The object method:
import numpy
def thing(int m, int n):
cdef int i, j
cdef object[:, :] A = numpy.empty((m, n), dtype=object)
for i in range(A.shape[0]):
for j in range(A.shape[1]):
A[i, j] = numpy.zeros(min(i, j))
return A
Note that that object[:, :]
syntax is the newer version, the numpy.ndarray[object, ndim=2]
version is deprecated. The newer version is GIL-free (well, probably not when using object
types), normally faster (never slower), type-agnostic (works on anything supporting memoryview
) and cleaner.
If you want to iterate over the sub-arrays, you'd be doing:
for i in range(A.shape[0]):
for j in range(A.shape[1]):
subarray = A[i, j]
for k in range(subarray.size):
...
and you can either type subarray
to object
(best for small subarray
s) or float[:]
(best for large subarray
s).
The C-level solution is proving to be very tricky. I have a feeling you'll basically end up writing it with pure-C types.
So I'm abandoning that, and here's what I'd do:
import numpy
def thing(int m, int n):
cdef int i, j
cdef float[:, :, :] A = numpy.zeros((m, n, min(m, n)), dtype=float)
cdef int[:, :] A_lengths = numpy.empty((m, n), dtype=int)
for i in range(A_lengths.shape[0]):
for j in range(A_lengths.shape[1]):
A_lengths[i, j] = min(i, j)
return A, A_lengths
Basically, make a 3D array and a 2D array of corresponding lengths. If there is only a linear variation in lengths (so the max length is a reasonable factor [I'd say up to about 10] of the mean length) then this should have acceptable overhead. It'll allow pure-C calculations while having a tasty memoryview interface.
That's all I've got. Take it or leave it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With