I was writing a new random number generator for numpy that produces random numbers according to an arbitrary distribution when I came across this really weird behavior: this is test.pyx <pre class="prettyprint"><code>#cython: boundscheck=False #cython: wraparound=False import numpy as np cimport numpy as np cimport cython def BareBones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r): return u def UntypedWithLoop(a,u,r): cdef int i,j=0 for i in range(u.shape[0]): j+=i return u,j def BSReplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u): cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int) cdef int i,j=0 for i in range(u.shape[0]): j=i return r </code></pre> setup.py <pre class="prettyprint"><code>from distutils.core import setup from Cython.Build import cythonize setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),) </code></pre> profiling code <pre class="prettyprint"><code>#!/usr/bin/python from __future__ import division import subprocess import timeit #Compile the cython modules before importing them subprocess.call(['python', 'setup.py', 'build_ext', '--inplace']) sstr=""" import test import numpy u=numpy.random.random(10) a=numpy.random.random(10) a=numpy.cumsum(a) a/=a[-1] r=numpy.empty(10,int) """ print "binary search: creates an array[N] and performs N binary searches to fill it:\n",timeit.timeit('numpy.searchsorted(a,u)',sstr) print "Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:\n",timeit.timeit('test.BSReplacement(a,u)',sstr) print "barebones function doing nothing:",timeit.timeit('test.BareBones(a,u,r)',sstr) print "Untyped inputs and doing N iterations:",timeit.timeit('test.UntypedWithLoop(a,u,r)',sstr) print "time for just np.empty()",timeit.timeit('numpy.empty(10,int)',sstr) </code></pre> The binary search implementation takes in the order of <code>len(u)*Log(len(a))</code> time to execute. The trivial cython function takes in the order of <code>len(u)</code> to run. Both return a 1D int array of len(u). however, even this no computation trivial implementation takes longer than the full binary search in the numpy library. (it was written in C: https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c see PyArray_SearchSorted) The results are: <pre class="prettyprint"><code>binary search: creates an array[N] and performs N binary searches to fill it: 1.15157485008 Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element: 3.69442796707 barebones function doing nothing: 0.87496304512 Untyped inputs and doing N iterations: 0.244267940521 time for just np.empty() 1.0983929634 </code></pre> Why is the np.empty() step taking so much time? and what can I do to get an empty array that I can return ? The C function does this AND runs a whole bunch of sanity checks AND uses a longer algorithm in the inner loop. (i removed all the logic except the loop itself fro my example) <hr> Update It turns out there are two distinct problems: <ol> <li>The np.empty(10) call alone has a ginormous overhead and takes as much time as it takes for searchsorted to make a new array AND perform 10 binary searches on it</li> <li>Just declaring the buffer syntax <code>np.ndarray[...]</code> also has a massive overhead that takes up MORE time than receiving the untyped variables AND iterating 50 times.</li> </ol> results for 50 iterations: <pre class="prettyprint"><code>binary search: 2.45336699486 Simple replacement:3.71126317978 barebones function doing nothing: 0.924916028976 Untyped inputs and doing N iterations: 0.316384077072 time for just np.empty() 1.04949498177 </code></pre>

There is a discussion of this on the Cython list that might have some useful suggestions: https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM Generally though I try to allocate small arrays outside of Cython, pass them in and re-use them in subsequent calls to the method. I understand that this is not always an option.

creating small arrays in cython takes a humongous amount of time

Tags:

performance

python

arrays

numpy

cython

I was writing a new random number generator for numpy that produces random numbers according to an arbitrary distribution when I came across this really weird behavior:

this is test.pyx

#cython: boundscheck=False
#cython: wraparound=False
import numpy as np
cimport numpy as np
cimport cython

def BareBones(np.ndarray[double, ndim=1] a,np.ndarray[double, ndim=1] u,r):
    return u

def UntypedWithLoop(a,u,r):
    cdef int i,j=0
    for i in range(u.shape[0]):
        j+=i
    return u,j

def BSReplacement(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
    cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
    cdef int i,j=0
    for i in range(u.shape[0]):
        j=i
    return r

setup.py

from distutils.core import setup
from Cython.Build import cythonize
setup(name = "simple cython func",ext_modules = cythonize('test.pyx'),)

profiling code

#!/usr/bin/python
from __future__ import division

import subprocess
import timeit

#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])

sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
r=numpy.empty(10,int)
"""

print "binary search: creates an array[N] and performs N binary searches to fill it:\n",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:\n",timeit.timeit('test.BSReplacement(a,u)',sstr)

print "barebones function doing nothing:",timeit.timeit('test.BareBones(a,u,r)',sstr)
print "Untyped inputs and doing N iterations:",timeit.timeit('test.UntypedWithLoop(a,u,r)',sstr)
print "time for just np.empty()",timeit.timeit('numpy.empty(10,int)',sstr)

The binary search implementation takes in the order of len(u)*Log(len(a)) time to execute. The trivial cython function takes in the order of len(u) to run. Both return a 1D int array of len(u).

however, even this no computation trivial implementation takes longer than the full binary search in the numpy library. (it was written in C: https://github.com/numpy/numpy/blob/202e78d607515e0390cffb1898e11807f117b36a/numpy/core/src/multiarray/item_selection.c see PyArray_SearchSorted)

The results are:

binary search: creates an array[N] and performs N binary searches to fill it:
1.15157485008
Simple replacement for binary search:takes the same args as np.searchsorted and similarly returns a new array. this performs only one trivial operation per element:
3.69442796707
barebones function doing nothing: 0.87496304512
Untyped inputs and doing N iterations: 0.244267940521
time for just np.empty() 1.0983929634

Why is the np.empty() step taking so much time? and what can I do to get an empty array that I can return ?

The C function does this AND runs a whole bunch of sanity checks AND uses a longer algorithm in the inner loop. (i removed all the logic except the loop itself fro my example)

Update

It turns out there are two distinct problems:

The np.empty(10) call alone has a ginormous overhead and takes as much time as it takes for searchsorted to make a new array AND perform 10 binary searches on it
Just declaring the buffer syntax np.ndarray[...] also has a massive overhead that takes up MORE time than receiving the untyped variables AND iterating 50 times.

results for 50 iterations:

binary search: 2.45336699486
Simple replacement:3.71126317978
barebones function doing nothing: 0.924916028976
Untyped inputs and doing N iterations: 0.316384077072
time for just np.empty() 1.04949498177

678

asked Aug 23 '13 19:08

staticd

2 Answers

There is a discussion of this on the Cython list that might have some useful suggestions: https://groups.google.com/forum/#!topic/cython-users/CwtU_jYADgM

Generally though I try to allocate small arrays outside of Cython, pass them in and re-use them in subsequent calls to the method. I understand that this is not always an option.

192

answered Sep 24 '22 10:09

JoshAdel

Creating np.empty inside the Cython function has some overhead as you already saw. Here you will see an example about how to create the empty array and pass it to the Cython module in order to fill with the correct values:

n=10:

numpy.searchsorted: 1.30574745517
cython O(1): 3.28732016088
cython no array declaration 1.54710909596

n=100:

numpy.searchsorted: 4.15200545373
cython O(1): 13.7273431067
cython no array declaration 11.4186086744

As you already pointed out, the numpy version scales better since it is O(len(u)*long(len(a))) and this algorithm here is O(len(u)*len(a))...

I also tried to use Memoryview, basically changing np.ndarray[double, ndim=1] by double[:], but the first option was faster in this case.

The new .pyx file is:

from __future__ import division
import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def JustLoop(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u,
             np.ndarray[int, ndim=1] r):
    cdef int i,j
    for j in range(u.shape[0]):
        if u[j] < a[0]:
            r[j] = 0
            continue

        if u[j] > a[a.shape[0]-1]:
            r[j] = a.shape[0]-1
            continue

        for i in range(1, a.shape[0]):
            if u[j] >= a[i-1] and u[j] < a[i]:
                r[j] = i
                break

@cython.boundscheck(False)
@cython.wraparound(False)
def WithArray(np.ndarray[double, ndim=1] a, np.ndarray[double, ndim=1] u):
    cdef np.ndarray[np.int_t, ndim=1] r=np.empty(u.shape[0],dtype=int)
    cdef int i,j
    for j in range(u.shape[0]):
        if u[j] < a[0]:
            r[j] = 0
            continue

        if u[j] > a[a.shape[0]-1]:
            r[j] = a.shape[0]-1
            continue

        for i in range(1, a.shape[0]):
            if u[j] >= a[i-1] and u[j] < a[i]:
                r[j] = i
                break
    return r

The new .py file:

import numpy
import subprocess
import timeit

#Compile the cython modules before importing them
subprocess.call(['python', 'setup.py', 'build_ext', '--inplace'])
from test import *

sstr="""
import test
import numpy
u=numpy.random.random(10)
a=numpy.random.random(10)
a=numpy.cumsum(a)
a/=a[-1]
a.sort()
r = numpy.empty(u.shape[0], dtype=int)
"""

print "numpy.searchsorted:",timeit.timeit('numpy.searchsorted(a,u)',sstr)
print "cython O(1):",timeit.timeit('test.WithArray(a,u)',sstr)
print "cython no array declaration",timeit.timeit('test.JustLoop(a,u,r)',sstr)

answered Sep 25 '22 10:09

Saullo G. P. Castro

Related questions
                            
                                Windows installer built with setup.py bdist_wininst triggers RuntimeError when installing. How do I fix this?
                            
                                python try except 0
                            
                                Creating mutually inclusive positional arguments with argparse
                            
                                Python 3 regex with diacritics and ligatures,
                            
                                Django signal after whole model has been saved
                            
                                numpy cov (covariance) function, what exactly does it compute?
                            
                                Using postgresql xml data type with sqlalchemy
                            
                                How to exactly solve quadratic equations with large integer coefficients (over integers)?
                            
                                Randomizing integer behavior
                            
                                Hashing same character multiple times
                            
                                flask-login not sure how to make it work using sqlite3
                            
                                PyQt - displaying widget on top of widget
                            
                                python logging with multiple modules does not work
                            
                                Why is there a large insert performance difference between python SqlAlchemy Boolean and Integer Type
                            
                                Reading data from csv into pandas when date and time are in separate columns
                            
                                how to encode/decode a simple string
                            
                                Can Python be used to send a true key down event to Mac
                            
                                SciPy 0.12.0 and Numpy 1.6.1 - numpy.core.multiarray failed to import
                            
                                Flask Principal granular resource on demand
                            
                                is there a method to skip unconvertible rows when casting a pandas series from str to float?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With