Fast Numpy Loops

Tags:

How do you optimize this code (without vectorizing, as this leads up to using the semantics of the calculation, which is quite often far from being non-trivial):

slow_lib.py:
import numpy as np

def foo():
    size = 200
    np.random.seed(1000031212)
    bar = np.random.rand(size, size)
    moo = np.zeros((size,size), dtype = np.float)
    for i in range(0,size):
        for j in range(0,size):
            val = bar[j]
            moo += np.outer(val, val)

The point is that such kind loops correspond quite often to operations where you have double sums over some vector operation.

This is quite slow:

>>t = timeit.timeit('foo()', 'from slow_lib import foo', number = 10)
>>print ("took: "+str(t))
took: 41.165681839

Ok, so then let's cynothize it and add type annotations likes there is no tomorrow:

c_slow_lib.pyx:
import numpy as np
cimport numpy as np
import cython
@cython.boundscheck(False)
@cython.wraparound(False)

def foo():
    cdef int size = 200
    cdef int i,j
    np.random.seed(1000031212)
    cdef np.ndarray[np.double_t, ndim=2] bar = np.random.rand(size, size)
    cdef np.ndarray[np.double_t, ndim=2] moo = np.zeros((size,size), dtype = np.float)
    cdef np.ndarray[np.double_t, ndim=1] val
    for i in xrange(0,size):
        for j in xrange(0,size):
            val = bar[j]
            moo += np.outer(val, val)


>>t = timeit.timeit('foo()', 'from c_slow_lib import foo', number = 10)
>>print ("took: "+str(t))
took: 42.3104710579

... ehr... what? Numba to the rescue!

numba_slow_lib.py:
import numpy as np
from numba import jit

size = 200
np.random.seed(1000031212)

bar = np.random.rand(size, size)

@jit
def foo():
    bar = np.random.rand(size, size)
    moo = np.zeros((size,size), dtype = np.float)
    for i in range(0,size):
        for j in range(0,size):
            val = bar[j]
            moo += np.outer(val, val)

>>t = timeit.timeit('foo()', 'from numba_slow_lib import foo', number = 10)
>>print("took: "+str(t))
took: 40.7327859402

So is there really no way to speed this up? The point is:

if I convert the inner loop into a vectorized version (building a larger matrix representing the inner loop and then calling np.outer on the larger matrix) I get much faster code.
if I implement something similar in Matlab (R2016a) this performs quite well due to JIT.

318

asked Jun 13 '16 15:06

ndbd

1 Answers

Here's the code for outer:

def outer(a, b, out=None):    
    a = asarray(a)
    b = asarray(b)
    return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out)

So each call to outer involves a number of python calls. Those eventually call compiled code to perform the multiplication. But each incurs an overhead that has nothing to do with the size of your arrays.

So 200 (200**2?) calls to outer will have all that overhead, whereas one call to outer with all 200 rows has one overhead set, followed by one fast compiled operation.

cython and numba don't compile or otherwise bypass the Python code in outer. All they can do is streamline the iteration code that you wrote - and that isn't consuming much time.

Without getting into details, the MATLAB jit must be able to replace the 'outer' with faster code - it rewrites the iteration. But my experience with MATLAB dates from a time before its jit.

For real speed improvements with cython and numba you need to use primitive numpy/python code all the way down. Or better yet focus your effort on slow inner pieces.

Replacing your outer with a streamlined version cuts run time about in half:

def foo1(N):
        size = N
        np.random.seed(1000031212)
        bar = np.random.rand(size, size)
        moo = np.zeros((size,size), dtype = np.float)
        for i in range(0,size):
                for j in range(0,size):
                        val = bar[j]
                        moo += val[:,None]*val   
        return moo

With the full N=200 your function took 17s per loop. If I replace the inner two lines with pass (no calculation), time drops to 3ms per loop. In other words, the outer loop mechanism is not a big time consumer, at least not compared to many calls to outer().

122

answered Sep 28 '22 00:09

hpaulj

Related questions
                            
                                Dot-boxplots from DataFrames
                            
                                run selenium with crontab (python)
                            
                                Indexes of elements in NumPy array that satisfy conditions on the value and the index
                            
                                Why is pandas.Series.std() different from numpy.std()?
                            
                                Calculate Mahalanobis distance using NumPy only
                            
                                find_package() errors during installing package via pip
                            
                                Python 2 - How would you round up/down to the nearest 6 minutes?
                            
                                Python using ZIP64 extensions when compressing large files
                            
                                Splitting columns of a numpy array easily
                            
                                Iterate over deque in python
                            
                                Using variables in the format() function in Python
                            
                                python pandas: how to find rows in one dataframe but not in another?
                            
                                How to run an function when anything changes in a dir with Python Watchdog?
                            
                                HTTPError: HTTP Error 503: Service Unavailable goslate language detection request : Python
                            
                                How to search for the last occurrence of a regular expression in a string in python?
                            
                                How can I dynamically render images from my images folder using Jinja and Flask?
                            
                                Viewing .npy images
                            
                                Using PythonService.exe to host python service while using virtualenv
                            
                                Python finding difference between two time stamps in minutes
                            
                                How to create a Manhattan plot with matplotlib in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast Numpy Loops

Tags:

python

vectorization

numpy

cython

ndbd

People also ask

1 Answers

hpaulj

Recent Activity

Donate For Us