Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorize a for loop in numpy to calculate duct-tape overlaping

I'm creating an application with python to calculate duct-tape overlapping (modeling a dispenser applies a product on a rotating drum).

I have a program that works correctly, but is really slow. I'm looking for a solution to optimize a for loop used to fill a numpy array. Could someone help me vectorize the code below?

import numpy as np
import matplotlib.pyplot as plt

# Some parameters
width = 264
bbddiam = 940
accuracy = 4 #2 points per pixel

drum = np.zeros(accuracy**2 * width * bbddiam).reshape((bbddiam * accuracy , width * accuracy))

# The "slow" function
def line_mask(drum, coef, intercept, upper=True, accuracy=accuracy):
    """Masks a half of the array"""
    to_return = np.zeros(drum.shape)
    for index, v in np.ndenumerate(to_return):
        if upper == True:
            if index[0] * coef + intercept > index[1]:
                to_return[index] = 1
        else:
            if index[0] * coef + intercept <= index[1]:
                to_return[index] = 1
    return to_return


def get_band(drum, coef, intercept, bandwidth):
    """Calculate a ribbon path on the drum"""
    to_return = np.zeros(drum.shape)
    t1 = line_mask(drum, coef, intercept + bandwidth / 2, upper=True)
    t2 = line_mask(drum, coef, intercept - bandwidth / 2, upper=False)
    to_return = t1 + t2
    return np.where(to_return == 2, 1, 0)

single_band = get_band(drum, 1 / 10, 130, bandwidth=15)

# Visualize the result !
plt.imshow(single_band)
plt.show()

Numba does wonders for my code, reducing runtime from 5.8s to 86ms (special thanks to @Maarten-vd-Sande):

from numba import jit
@jit(nopython=True, parallel=True)
def line_mask(drum, coef, intercept, upper=True, accuracy=accuracy):
    ...

A better solution with numpy is still welcome ;-)

like image 364
Laurent R Avatar asked Jan 03 '20 21:01

Laurent R


1 Answers

There is no need for any looping at all here. You have effectively two different line_mask functions. Neither needs to be looped explicitly, but you would probably get a significant speedup just from rewriting it with a pair of for loops in an if and else, rather than an if and else in a for loop, which gets evaluated many many times.

The really numpythonic thing to do is to properly vectorize your code to operate on entire arrays without any loops. Here is a vectorized version of line_mask:

def line_mask(drum, coef, intercept, upper=True, accuracy=accuracy):
    """Masks a half of the array"""
    r = np.arange(drum.shape[0]).reshape(-1, 1)
    c = np.arange(drum.shape[1]).reshape(1, -1)
    comp = c.__lt__ if upper else c.__ge__
    return comp(r * coef + intercept)

Setting up the shapes of r and c to be (m, 1) and (n, 1) so that the result is (m, n) is called broadcasting, and is the staple of vectorization in numpy.

The result of the updated line_mask is a boolean mask (as the name implies) rather than a float array. This makes it smaller, and hopefully bypasses float operations entirely. You can now rewrite get_band to use masking instead of addition:

def get_band(drum, coef, intercept, bandwidth):
    """Calculate a ribbon path on the drum"""
    t1 = line_mask(drum, coef, intercept + bandwidth / 2, upper=True)
    t2 = line_mask(drum, coef, intercept - bandwidth / 2, upper=False)
    return t1 & t2

The remainder of the program should stay the same, since these functions preserve all the interfaces.

If you want, you can rewrite most of your program in three (still somewhat legible) lines:

coeff = 1/10
intercept = 130
bandwidth = 15

r, c = np.ogrid[:drum.shape[0], :drum.shape[1]]
check = r * coeff + intercept
single_band = ((check + bandwidth / 2 > c) & (check - bandwidth /  2 <= c))
like image 144
Mad Physicist Avatar answered Oct 28 '22 08:10

Mad Physicist