Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct usage of numpy recarrays as c structarrays in cython

I would like to use something like a structarray in cython, and I would like this structarray as easily accessible in python as in cython. Based on a whim I used a recarray using a dtype that looks like the struct that I would like to use. Curiously, it just works and allows me to use a c structarray that, over the hood ;), is a numpy recarray for the python user.

Here is my example

# This is a "structarray in cython with numpy recarrays" testfile
import numpy as np
cimport numpy as np

# My structarray has nodes with fields x and y
# This also works without packed, but I have seen packed used in other places where people asked similar questions
# I assume that for two doubles that is equivalent but is necessary for in8s in between
cdef packed struct node:
    double x
    double y
# I suppose that would be the equivalent numpy dtype?
# Note: During compilation it warns me about double to float downcasts, but I do not see where
nodetype = [('x' , np.float64),('y', np.float64)]

def fun():
    # Make 10 element recarray
    # (Just looked it up. A point where 1-based indexing would save a look in the docs)
    mynode1 = np.recarray(10,dtype=nodetype)

    # Recarray with cdef struct
    mynode1 = np.recarray(10,dtype=nodetype)

    # Fill it with non-garbage somewhere
    mynode1[2].x=1.0
    mynode1[2].y=2.0

    # Brave: give recarray element to a c function assuming its equivalent to the struct
    ny = cfuny(mynode1[2])
    assert ny==2.0 # works!

    # Test memoryview, assuming type node
    cdef node [:] nview = mynode1
    ny = cfunyv(nview,2)
    assert ny==2.0 # works!

    # This sets the numpy recarray value with a c function the gts a memoryview
    cfunyv_set(nview,5,9.0)
    assert mynode1[5].y==9.0 # alsow works!

    return 0

# return node element y from c struct node
cdef double cfuny(node n):
    return n.y

# give element i from memoryview of recarray to c function expecting a c struct
cdef double cfunyv(node [:] n, int i):
    return cfuny(n[i])

# write into recarray with a function expecting a memoryview with type node
cdef int cfunyv_set(node [:] n,int i,double val):
    n[i].y = val
    return 0

Of course I am not the first to try this.

Here for example the same thing is done, and it even states that this usage would be part of the manual here, but I cannot find this on the page. I suspect it was there at some point. There are also several discussions involving the use of strings in such a custom type (e.g. here), and from the answers I gather that the possibility of casting a recarray on a cstruct is intended behaviour, as the discussion talks about incorporating a regression test about the given example and having fixed the string error at some point.

My question

I could not find any documentation that states that this should work besides forum answers. Can someone show me where that is documented?

And, for some additional curiosity

  • Will this likely break at any point during the development of numpy or cython?
  • From the other forum entries on the subject it seems that packed is necessary for this to work once more interesting datatypes are part of the struct. I am not a compiler expert and have never used structure packing myself, but I suspect that whether a structure gets packed or not depends on the compiler settings. Does that mean that someone who compiles numpy without packing structures needs to compile this cython code without the packed?
like image 960
Maximilian Avatar asked Oct 18 '22 00:10

Maximilian


2 Answers

This doesn't seem to be directly documented. Best reference I can give you is the typed memoryview docs here.

Rather than specific cython support for numpy structured dtypes this instead seems a consequence of support for the PEP 3118 buffer protocol. numpy exposes a Py_buffer struct for its arrays, and cython knows how to cast those into structs.

The packing is necessary. My understanding is x86 is aligned on itemsize byte boundaries, whereas as a numpy structured dtype is packed into the minimum space possible. Probably clearest by example:

%%cython
import numpy as np

cdef struct Thing:
    char a
    # 7 bytes padding, double must be 8 byte aligned
    double b

thing_dtype = np.dtype([('a', np.byte), ('b', np.double)])
print('dtype size: ', thing_dtype.itemsize)
print('unpacked struct size', sizeof(Thing))
dtype size:  9
unpacked struct size 16
like image 191
chrisb Avatar answered Oct 19 '22 23:10

chrisb


Just answering the final sub-question:

From the other forum entries on the subject it seems that packed is necessary for this to work once more interesting datatypes are part of the struct. I am not a compiler expert and have never used structure packing myself, but I suspect that whether a structure gets packed or not depends on the compiler settings. Does that mean that someone who compiles numpy without packing structures needs to compile this cython code without the packed?

Numpy's behaviour is decided at runtime rather than compile-time. It will calculate the minimum amount of space a structure can need and allocate blocks of that. It won't be changed by any compiler settings so should be reliable.

cdef packed struct is therefore always needed to match numpy. However, it does not generate standards compliant C code. Instead, it uses extensions to GCC, MSVC (and others). Therefore it works fine on the major C compilers that currently exist, but in principle might fail on a future compiler. It looks like it should be possible to use the C11 standard alignas to achieve the same thing in a standards compliant way, so Cython could hopefully be modified to do that if needed.

like image 24
DavidW Avatar answered Oct 19 '22 22:10

DavidW