I'm creating an application with python to calculate duct-tape overlapping (modeling a dispenser applies a product on a rotating drum).
I have a program that works correctly, but is really slow. I'm looking for a solution to optimize a for
loop used to fill a numpy array. Could someone help me vectorize the code below?
import numpy as np
import matplotlib.pyplot as plt
# Some parameters
width = 264
bbddiam = 940
accuracy = 4 #2 points per pixel
drum = np.zeros(accuracy**2 * width * bbddiam).reshape((bbddiam * accuracy , width * accuracy))
# The "slow" function
def line_mask(drum, coef, intercept, upper=True, accuracy=accuracy):
"""Masks a half of the array"""
to_return = np.zeros(drum.shape)
for index, v in np.ndenumerate(to_return):
if upper == True:
if index[0] * coef + intercept > index[1]:
to_return[index] = 1
else:
if index[0] * coef + intercept <= index[1]:
to_return[index] = 1
return to_return
def get_band(drum, coef, intercept, bandwidth):
"""Calculate a ribbon path on the drum"""
to_return = np.zeros(drum.shape)
t1 = line_mask(drum, coef, intercept + bandwidth / 2, upper=True)
t2 = line_mask(drum, coef, intercept - bandwidth / 2, upper=False)
to_return = t1 + t2
return np.where(to_return == 2, 1, 0)
single_band = get_band(drum, 1 / 10, 130, bandwidth=15)
# Visualize the result !
plt.imshow(single_band)
plt.show()
Numba does wonders for my code, reducing runtime from 5.8s to 86ms (special thanks to @Maarten-vd-Sande):
from numba import jit
@jit(nopython=True, parallel=True)
def line_mask(drum, coef, intercept, upper=True, accuracy=accuracy):
...
A better solution with numpy is still welcome ;-)
There is no need for any looping at all here. You have effectively two different line_mask
functions. Neither needs to be looped explicitly, but you would probably get a significant speedup just from rewriting it with a pair of for
loops in an if
and else
, rather than an if
and else
in a for
loop, which gets evaluated many many times.
The really numpythonic thing to do is to properly vectorize your code to operate on entire arrays without any loops. Here is a vectorized version of line_mask
:
def line_mask(drum, coef, intercept, upper=True, accuracy=accuracy):
"""Masks a half of the array"""
r = np.arange(drum.shape[0]).reshape(-1, 1)
c = np.arange(drum.shape[1]).reshape(1, -1)
comp = c.__lt__ if upper else c.__ge__
return comp(r * coef + intercept)
Setting up the shapes of r
and c
to be (m, 1)
and (n, 1)
so that the result is (m, n)
is called broadcasting, and is the staple of vectorization in numpy.
The result of the updated line_mask
is a boolean mask (as the name implies) rather than a float array. This makes it smaller, and hopefully bypasses float operations entirely. You can now rewrite get_band
to use masking instead of addition:
def get_band(drum, coef, intercept, bandwidth):
"""Calculate a ribbon path on the drum"""
t1 = line_mask(drum, coef, intercept + bandwidth / 2, upper=True)
t2 = line_mask(drum, coef, intercept - bandwidth / 2, upper=False)
return t1 & t2
The remainder of the program should stay the same, since these functions preserve all the interfaces.
If you want, you can rewrite most of your program in three (still somewhat legible) lines:
coeff = 1/10
intercept = 130
bandwidth = 15
r, c = np.ogrid[:drum.shape[0], :drum.shape[1]]
check = r * coeff + intercept
single_band = ((check + bandwidth / 2 > c) & (check - bandwidth / 2 <= c))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With