How can one declare an array of arrays in cython? More precisely, I want to construct (declare and then initialize) an m by n matrix, call it A, in which each entry [i,j] is a 1-dimensional array of doubles (of length <code>min(i,j)</code>, filled with zeros) of the form <pre class="prettyprint"><code>cdef np.ndarray[np.double_t, ndim=1] A[i,j] A[i,j] = np.zeros((min(i,j)), dtype=np.double) </code></pre> For (m,n)=(4,3), <code>print A</code> should return something like this: <pre class="prettyprint"><code>[[[], [], []], [[], [0.], [0.]], [[], [0.], [0.,0.]], [[], [0.], [0.,0.]]] </code></pre> How do I declare and initialize A?

The object method: <pre class="prettyprint"><code>import numpy def thing(int m, int n): cdef int i, j cdef object[:, :] A = numpy.empty((m, n), dtype=object) for i in range(A.shape[0]): for j in range(A.shape[1]): A[i, j] = numpy.zeros(min(i, j)) return A </code></pre> Note that that <code>object[:, :]</code> syntax is the newer version, the <code>numpy.ndarray[object, ndim=2]</code> version is deprecated. The newer version is GIL-free (well, probably not when using <code>object</code> types), normally faster (never slower), type-agnostic (works on anything supporting <code>memoryview</code>) and cleaner. If you want to iterate over the sub-arrays, you'd be doing: <pre class="prettyprint"><code>for i in range(A.shape[0]): for j in range(A.shape[1]): subarray = A[i, j] for k in range(subarray.size): ... </code></pre> and you can either type <code>subarray</code> to <code>object</code> (best for small <code>subarray</code>s) or <code>float[:]</code> (best for large <code>subarray</code>s). <hr> The C-level solution is proving to be very tricky. I have a feeling you'll basically end up writing it with pure-C types. So I'm abandoning that, and here's what I'd do: <pre class="prettyprint"><code>import numpy def thing(int m, int n): cdef int i, j cdef float[:, :, :] A = numpy.zeros((m, n, min(m, n)), dtype=float) cdef int[:, :] A_lengths = numpy.empty((m, n), dtype=int) for i in range(A_lengths.shape[0]): for j in range(A_lengths.shape[1]): A_lengths[i, j] = min(i, j) return A, A_lengths </code></pre> Basically, make a 3D array and a 2D array of corresponding lengths. If there is only a linear variation in lengths (so the max length is a reasonable factor [I'd say up to about 10] of the mean length) then this should have acceptable overhead. It'll allow pure-C calculations while having a tasty memoryview interface. <hr> That's all I've got. Take it or leave it.

arrays of arrays in cython

Tags:

python

arrays

numpy

cython

How can one declare an array of arrays in cython?

More precisely, I want to construct (declare and then initialize) an m by n matrix, call it A, in which each entry [i,j] is a 1-dimensional array of doubles (of length min(i,j), filled with zeros) of the form

cdef np.ndarray[np.double_t, ndim=1] A[i,j]
A[i,j] = np.zeros((min(i,j)), dtype=np.double)

For (m,n)=(4,3), print A should return something like this:

[[[], [], []],
[[], [0.], [0.]],
[[], [0.], [0.,0.]],
[[], [0.], [0.,0.]]]

How do I declare and initialize A?

602

asked Sep 27 '13 15:09

danb

1 Answers

The object method:

import numpy

def thing(int m, int n):
    cdef int i, j
    cdef object[:, :] A = numpy.empty((m, n), dtype=object)

    for i in range(A.shape[0]):
        for j in range(A.shape[1]):
            A[i, j] = numpy.zeros(min(i, j))

    return A

Note that that object[:, :] syntax is the newer version, the numpy.ndarray[object, ndim=2] version is deprecated. The newer version is GIL-free (well, probably not when using object types), normally faster (never slower), type-agnostic (works on anything supporting memoryview) and cleaner.

If you want to iterate over the sub-arrays, you'd be doing:

for i in range(A.shape[0]):
    for j in range(A.shape[1]):
        subarray = A[i, j]
        for k in range(subarray.size):
            ...

and you can either type subarray to object (best for small subarrays) or float[:] (best for large subarrays).

The C-level solution is proving to be very tricky. I have a feeling you'll basically end up writing it with pure-C types.

So I'm abandoning that, and here's what I'd do:

import numpy

def thing(int m, int n):
    cdef int i, j

    cdef float[:, :, :] A = numpy.zeros((m, n, min(m, n)), dtype=float)
    cdef int[:, :] A_lengths = numpy.empty((m, n), dtype=int)

    for i in range(A_lengths.shape[0]):
        for j in range(A_lengths.shape[1]):
            A_lengths[i, j] = min(i, j)

    return A, A_lengths

Basically, make a 3D array and a 2D array of corresponding lengths. If there is only a linear variation in lengths (so the max length is a reasonable factor [I'd say up to about 10] of the mean length) then this should have acceptable overhead. It'll allow pure-C calculations while having a tasty memoryview interface.

That's all I've got. Take it or leave it.

answered Oct 05 '22 11:10

Veedrac

Related questions
                            
                                New column based on conditional selection from the values of 2 other columns in a Pandas DataFrame
                            
                                Python, trying to get file extension via URL
                            
                                Coordinates of box of annotations in matplotlib
                            
                                @login_required is losing the current specified language
                            
                                May someone explain this decorator code to me?
                            
                                Sublime Text 2. Autocomplete python `from`
                            
                                How to open a csv file in Microsoft Excel in Python?
                            
                                App Engine deserializing records in python: is it really this slow?
                            
                                Assigning identical array indices at once in Python/Numpy
                            
                                How to write raw bytes to Google cloud storage with GAE's Python API
                            
                                Iterator over all partitions into k groups?
                            
                                Can I patch 'random' using unittest.mock.patch?
                            
                                Python, why it is errors 10035 (on server) and 10053 (on client) during using TCP sockets?
                            
                                Why search in sorted list in python takes longer?
                            
                                Play a part of a .wav file in python
                            
                                how to develop a web app with angularjs at client-side and flask at backend?
                            
                                matplotlib 1.3.0 ImportError: DLL load failed: %1 is not a valid Win32 application
                            
                                Allowing null value in Peewee
                            
                                How to translate plurals of a model in the Django admin?
                            
                                How to print out the file name and line number of the test in python nose?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With