Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing NumPy record array columns in Cython

I'm a relatively experienced Python programmer, but haven't written any C in a very long time and am attempting to understand Cython. I'm trying to write a Cython function that will operate on a column of a NumPy recarray.

The code I have so far is below.

recarray_func.pyx:

import numpy as np
cimport numpy as np

cdef packed struct rec_cell0:
  np.float32_t f0
  np.int64_t i0, i1, i2

def sum(np.ndarray[rec_cell0, ndim=1] recarray):
    cdef Py_ssize_t i
    cdef rec_cell0 *cell
    cdef np.float32_t running_sum = 0

    for i in range(recarray.shape[0]):
        cell = &recarray[i]
        running_sum += cell.f0
    return running_sum

At the interpreter prompt:

array = np.recarray((100, ), names=['f0', 'i0', 'i1', 'i2'],
                             formats=['f4', 'i8', 'i8', 'i8'])
recarray_func.sum(array)

This simply sums the f0 column of the recarray. It compiles and runs without a problem.

My question is, how would I modify this so that it can operate on any column? In the example above, the column to sum is hard coded and accessed through dot notation. Is it possible to change the function so the column to sum is passed in as a parameter?

like image 445
joshayers Avatar asked Feb 23 '12 23:02

joshayers


1 Answers

I believe this should be possible using Cython's memoryviews. Something along these lines should work (code not tested):

import numpy as np
cimport numpy as np

cdef packed struct rec_cell0:
  np.float32_t f0
  np.int64_t i0, i1, i2

def sum(rec_cell0[:] recview):
    cdef Py_ssize_t i
    cdef np.float32_t running_sum = 0

    for i in range(recview.shape[0]):
        running_sum += recview[i].f0
    return running_sum

Speed can probably be increased by ensuring that the record array you pass to Cython is contiguous. On the python (calling) side, you can use np.require, while the function signature should change to rec_cell0[::1] recview to indicate that the array can be assumed to be contiguous. And as always, once the code has been tested, turning off the boundscheck, wraparound and nonecheck compiler directives in Cython will likely further improve speed.

like image 192
josteinb Avatar answered Oct 24 '22 06:10

josteinb