Speed up Python/Cython loops.

Tags:

I've trying to get a loop in python to run as fast as possible. So I've dived into NumPy and Cython. Here's the original Python code:

def calculate_bsf_u_loop(uvel,dy,dz):
   """
   Calculate barotropic stream function from zonal velocity

   uvel (t,z,y,x)
   dy   (y,x)
   dz   (t,z,y,x)

   bsf  (t,y,x)
   """

   nt = uvel.shape[0]
   nz = uvel.shape[1]
   ny = uvel.shape[2]
   nx = uvel.shape[3]

   bsf = np.zeros((nt,ny,nx))

   for jn in range(0,nt):
      for jk in range(0,nz):
         for jj in range(0,ny):
            for ji in range(0,nx):
               bsf[jn,jj,ji] = bsf[jn,jj,ji] + uvel[jn,jk,jj,ji] * dz[jn,jk,jj,ji] * dy[jj,ji] 

   return bsf

It's just a sum over k indices. Array sizes are nt=12, nz=75, ny=559, nx=1442, so ~725 million elements. That took 68 seconds. Now, I've done it in cython as

import numpy as np
cimport numpy as np
cimport cython

@cython.boundscheck(False) # turn off bounds-checking for entire function
@cython.wraparound(False)  # turn off negative index wrapping for entire function

## Use cpdef instead of def
## Define types for arrays
cpdef calculate_bsf_u_loop(np.ndarray[np.float64_t, ndim=4] uvel, np.ndarray[np.float64_t, ndim=2] dy, np.ndarray[np.float64_t, ndim=4] dz):
   """
   Calculate barotropic stream function from zonal velocity

   uvel (t,z,y,x)
   dy   (y,x)
   dz   (t,z,y,x)

   bsf  (t,y,x)
   """

   ## cdef the constants
   cdef int nt = uvel.shape[0]
   cdef int nz = uvel.shape[1]
   cdef int ny = uvel.shape[2]
   cdef int nx = uvel.shape[3]

   ## cdef loop indices
   cdef ji,jj,jk,jn

   ## cdef. Note that the cdef is followed by cython type
   ## but the np.zeros function as python (numpy) type
   cdef np.ndarray[np.float64_t, ndim=3] bsf = np.zeros([nt,ny,nx], dtype=np.float64)

   for jn in xrange(0,nt):
      for jk in xrange(0,nz):
         for jj in xrange(0,ny):
            for ji in xrange(0,nx):
               bsf[jn,jj,ji] += uvel[jn,jk,jj,ji] * dz[jn,jk,jj,ji] * dy[jj,ji] 

   return bsf

and that took 49 seconds. However, swapping the loop for

for jn in range(0,nt):
      for jk in range(0,nz):
         bsf[jn,:,:] = bsf[jn,:,:] + uvel[jn,jk,:,:] * dz[jn,jk,:,:] * dy[:,:]

only takes 0.29 seconds! Unfortunately, I can't do this in my full code.

Why is NumPy slicing so much faster than the Cython loop? I thought NumPy was fast because it is Cython under the hood. So shouldn't they be of similar speed?

As you can see, I've disabled boundary checks in cython, and I've also compiled using "fast math". However, this only gives a tiny speedup. Is there anyway to get a loop to be of similar speed as NumPy slicing, or is looping always slower than slicing?

Any help is greatly appreciated! /Joakim

419

asked Nov 17 '15 12:11

Joakim

1 Answers

That code is screaming for numpy.einsum's's intervention, given that you are doing elementwise-multiplication and then sum-reduction on the second axis of the 4D product array, which essenti ally numpy.einsum does in a highly efficient manner. To solve your case, you can use numpy.einsum in two ways -

bsf = np.einsum('ijkl,ijkl,kl->ikl',uvel,dz,dy)

bsf = np.einsum('ijkl,ijkl->ikl',uvel,dz)*dy

Runtime tests & Verify outputs -

In [100]: # Take a (1/5)th of original input shapes
     ...: original_shape = [12,75, 559,1442]
     ...: m,n,p,q = (np.array(original_shape)/5).astype(int)
     ...: 
     ...: # Generate random arrays with given shapes
     ...: uvel = np.random.rand(m,n,p,q)
     ...: dy = np.random.rand(p,q)
     ...: dz = np.random.rand(m,n,p,q)
     ...: 

In [101]: bsf = calculate_bsf_u_loop(uvel,dy,dz)

In [102]: print(np.allclose(bsf,np.einsum('ijkl,ijkl,kl->ikl',uvel,dz,dy)))
True

In [103]: print(np.allclose(bsf,np.einsum('ijkl,ijkl->ikl',uvel,dz)*dy))
True

In [104]: %timeit calculate_bsf_u_loop(uvel,dy,dz)
1 loops, best of 3: 2.16 s per loop

In [105]: %timeit np.einsum('ijkl,ijkl,kl->ikl',uvel,dz,dy)
100 loops, best of 3: 3.94 ms per loop

In [106]: %timeit np.einsum('ijkl,ijkl->ikl',uvel,dz)*dy
100 loops, best of 3: 3.96 ms per loo

answered Sep 27 '22 17:09

Divakar

Related questions
                            
                                find all digits between a character in python
                            
                                How to capture STDOUT of a Python process running under IIS, FastCGI, and WSGI?
                            
                                Python: Implicit conversion of object to str?
                            
                                Optimization with Python (scipy.optimize)
                            
                                Python :[Error 3] The system cannot find the path specified:
                            
                                APScheduler how to add job outside the scheduler?
                            
                                Is there any way to speed up an import?
                            
                                Open a sqlite3 database from an io.BytesIO stream?
                            
                                TCP vs. UDP socket latency benchmark
                            
                                Why can't you xor bytes objects in python?
                            
                                Python: dangers of temporarily changing the random seed using a context manager?
                            
                                Running Celery worker inside an app context still raises "working outside of app context" error in task
                            
                                How to parse freedict files (*.dict and *.index)
                            
                                Make a string output as a list in Pwm
                            
                                decoupled frontend and backend with Django, webpack, reactjs, react-router
                            
                                Linear system solution with fractions in numpy
                            
                                Get year from Django DateTimeField with values()
                            
                                create .deb package by python setup.py
                            
                                Garbage collection in python and reference counting
                            
                                Collectstatic only for certain apps

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Speed up Python/Cython loops.

Tags:

performance

python

arrays

numpy

cython

Joakim

People also ask

1 Answers

Divakar

Recent Activity

Donate For Us