I would like to have a cython array of a cdef class:
cdef class Child:
cdef int i
def do(self):
self.i += 1
cdef class Mother:
cdef Child[:] array_of_child
def __init__(self):
for i in range(100):
self.array_of_child[i] = Child()
You can use NumPy from Cython exactly the same as in regular Python, but by doing so you are losing potentially high speedups because Cython has support for fast access to NumPy arrays.
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
The answer is no - it is not really possible in a useful way: newsgroup post of essentially the same question
It wouldn't be possible to have a direct array (allocated in a single chunk) of Child
s. Partly because, if somewhere else ever gets a reference to a Child
in the array, that Child
has to be kept alive (but not the whole array) which wouldn't be possible to ensure if they were all allocated in the same chunk of memory. Additionally, were the array to be resized (if this is a requirement) then it would invalidate any other references to the objects within the array.
Therefore you're left with having an array of pointers to Child
. Such a structure would be fine, but internally would look almost exactly like a Python list (so there's really no benefit to doing something more complicated in Cython...).
There are a few sensible workarounds:
The workaround suggested in the newsgroup post is just to use a python list. You could also use a numpy array with dtype=object
. If you need to to access a cdef function in the class you can do a cast first:
cdef Child c = <Child?>a[0] # omit the ? if you don't want
# the overhead of checking the type.
c.some_cdef_function()
Internally both these options are stored as an C array of PyObject
pointers to your Child
objects and so are not as inefficient as you probably assume.
A further possibility might be to store your data as a C struct (cdef struct ChildStruct: ....
) which can be readily stored as an array. When you need a Python interface to that struct you can either define Child
so it contains a copy of ChildStruct
(but modifications won't propagate back to your original array), or a pointer to ChildStruct
(but you need to be careful with ensuring that the memory is not freed which the Child
pointing to it is alive).
You could use a Numpy structured array - this is pretty similar to using an array of C structs except Numpy handles the memory, and provides a Python interface.
The memoryview syntax in your question is valid: cdef Child[:] array_of_child
. This can be initialized from a numpy array of dtype object
:
array_of_child = np.array([(Child() for i in range(100)])
In terms of data-structure, this is an array of pointers (i.e. the same as a Python list, but can be multi-dimensional). It avoids the need for <Child>
casting. The important thing it doesn't do is any kind of type-checking - if you feed an object that isn't Child
into the array then it won't notice (because the underlying dtype
is object
), but will give nonsense answers or segmentation faults.
In my view this approach gives you a false sense of security about two things: first that you have made a more efficient data structure (you haven't, it's basically the same as a list); second that you have any kind of type safety. However, it does exist. (If you want to use memoryviews, e.g. for multi-dimensional arrays, it would probably be better to use a memoryview of type object
- this is honest about the underlying dtype)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With