Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient math ops on small arrays in python with cython

I use numpexpr for fast math on large arrays but if the size of the array is less than the CPU cache, writing my code in Cython using simple array math is way faster, especially, if the function is called multiple times.

The issue is, how do you work with arrays in Cython, or more explicitly: is there a direct interface to Python's array.array type in Cython? What I would like to do is something like this (simple example)

cpdef array[double] running_sum(array[double] arr):
    cdef int i 
    cdef int n = len(arr)
    cdef array[double] out = new_array_zeros(1.0, n)
    ... # some error checks
    out[0] = arr[0]
    for i in xrange(1,n-1):
        out[i] = out[i-1] + arr[i]

    return(out)

I first tried using Cython numpy wrapper and worked with the ndarrays but it seems that creating them is very costly for small 1D arrays, compared with creating a C array with malloc (but memory handling becomes a pain).

Thanks!

like image 775
chronos Avatar asked Mar 19 '11 01:03

chronos


People also ask

Does Cython improve NumPy?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

Is NumPy written in Cython?

NumPy is mostly written in C. The main advantage of Python is that there are a number of ways of very easily extending your code with C (ctypes, swig,f2py) / C++ (boost. python, weave.


1 Answers

You can roll your simple own with basic functions and checks here is a mockup to start:

from libc.stdlib cimport malloc,free

cpdef class SimpleArray:
    cdef double * handle
    cdef public int length
    def __init__(SimpleArray self, int n):
        self.handle = <double*>malloc(n * sizeof(double))
        self.length = n
    def __getitem__(self, int idx):
        if idx < self.length:
            return self.handle[idx]
        raise ValueError("Invalid Idx")
    def __dealloc__(SimpleArray self):
        free(self.handle) 

cpdef SimpleArray running_sum(SimpleArray arr):
    cdef int i 
    cdef SimpleArray out = SimpleArray(arr.length)

    out.handle[0] = arr.handle[0]
    for i from 1 < i < arr.length-1:
        out.handle[i] = out.handle[i-1] + arr.handle[i]
    return out

can be used as

>>> import test
>>> simple = test.SimpleArray(100)
>>> del simple
>>> test.running_sum(test.SimpleArray(100))
<test.SimpleArray object at 0x1002a90b0>
like image 175
fabrizioM Avatar answered Sep 18 '22 12:09

fabrizioM