Converting a nested loop calculation to Numpy for speedup

Question

Part of my Python program contains the follow piece of code, where a new grid is calculated based on data found in the old grid.

The grid i a two-dimensional list of floats. The code uses three for-loops:

for t in xrange(0, t, step):
    for h in xrange(1, height-1):
        for w in xrange(1, width-1):
            new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w])
    gr = new_gr

return gr

The code is extremly slow for a large grid and a large time t.

I've tried to use Numpy to speed up this code, by substituting the inner loop with:

J = np.arange(1, width-1)
new_gr[h][J] = gr[h][J] + gr[h][J-1] ...

But the results produced (the floats in the array) are about 10% smaller than their list-calculation counterparts.

What loss of accuracy is to be expected when converting lists of floats to Numpy array of floats using np.array(pylist) and then doing a calculation?
How should I go about converting a triple for-loop to pretty and fast Numpy code? (or are there other suggestions for speeding up the code significantly?)

Divakar · Accepted Answer

If gr is a list of floats, the first step if you are looking to vectorize with NumPy would be to convert gr to a NumPy array with np.array().

Next up, I am assuming that you have new_gr initialized with zeros of shape (height,width). The calculations being performed in the two innermost loops basically represent 2D convolution. So, you can use signal.convolve2d with an appropriate kernel. To decide on the kernel, we need to look at the scaling factors and make a 3 x 3 kernel out of them and negate them to simulate the calculations we are doing with each iteration. Thus, you would have a vectorized solution with the two innermost loops being removed for better performance, like so -

import numpy as np
from scipy import signal

# Get the scaling factors and negate them to get kernel
kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])

# Initialize output array and run 2D convolution and set values into it
out = np.zeros((height,width))
out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]

Verify output and runtime tests

Define functions :

def org_app(gr,t):
    new_gr = np.zeros((height,width))
    for h in xrange(1, height-1):
        for w in xrange(1, width-1):
            new_gr[h][w] = gr[h][w] + gr[h][w-1] + gr[h-1][w] + t * gr[h+1][w-1]-2 * (gr[h][w-1] + t * gr[h-1][w]) 
    return new_gr

def proposed_app(gr,t):
    kernel = -np.array([[0,1-2*t,0],[-1,1,0,],[t,0,0]])
    out = np.zeros((height,width))
    out[1:-1,1:-1] = signal.convolve2d(gr, kernel, mode='same')[1:-1,:-2]
    return out

Verify -

In [244]: # Inputs
     ...: gr = np.random.rand(40,50)
     ...: height,width = gr.shape
     ...: t = 1
     ...: 

In [245]: np.allclose(org_app(gr,t),proposed_app(gr,t))
Out[245]: True

Timings -

In [246]: # Inputs
     ...: gr = np.random.rand(400,500)
     ...: height,width = gr.shape
     ...: t = 1
     ...: 

In [247]: %timeit org_app(gr,t)
1 loops, best of 3: 2.13 s per loop

In [248]: %timeit proposed_app(gr,t)
10 loops, best of 3: 19.4 ms per loop

hpaulj · Answer

@Divakar, I tried a couple of variations on your org_app. The fully vectorized version is:

def org_app4(gr,t):
    new_gr = np.zeros((height,width))
    I = np.arange(1,height-1)[:,None]
    J = np.arange(1,width-1)
    new_gr[I,J] = gr[I,J] + gr[I,J-1] + gr[I-1,J] + t * gr[I+1,J-1]-2 * (gr[I,J-1] + t * gr[I-1,J])
    return new_gr

While half the speed of your proposed_app, it is closer in style to the original. And thus may help with understanding how nested loops can be vectorized.

An important step is the conversion of I into a column array, so that together I,J index a block of values.

Converting a nested loop calculation to Numpy for speedup

Tags:

performance

python

optimization

floating-point

numpy

darthur

2 Answers

Divakar

hpaulj

Recent Activity

Donate For Us

Converting a nested loop calculation to Numpy for speedup

Tags:

performance

python

optimization

floating-point

numpy

darthur

2 Answers

Divakar

hpaulj

Related questions

Recent Activity

Donate For Us