I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:
def _func(a, b, radius): "Return 0 if a>b, otherwise return 1" if distance.euclidean(a, b) < radius: return 1 else: return 0 def _make_mask(volume, roi, radius): mask = numpy.zeros(volume.shape) for x in range(volume.shape[0]): for y in range(volume.shape[1]): for z in range(volume.shape[2]): mask[x, y, z] = _func((x, y, z), roi, radius) return mask
Where volume.shape
(182, 218, 200) and roi.shape
(3,) are both ndarray
types; and radius
is an int
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”
Nested For LoopsLoops can be nested in Python, as they can with other programming languages. The program first encounters the outer loop, executing its first iteration. This first iteration triggers the inner, nested loop, which then runs to completion.
What is Vectorization ? Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time of code efficiently.
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
Approach #1
Here's a vectorized approach -
m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] mask = X**2 + Y**2 + Z**2 < radius**2
Possible improvement : We can probably speedup the last step with numexpr
module -
import numexpr as ne mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')
Approach #2
We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi
on the fly without actually creating the meshes as done earlier with np.mgrid
. This would be benefited by the use of broadcasting
for efficiency purposes. The implementation would look like this -
m,n,r = volume.shape vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \ ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2) mask = vals < radius**2
Simplified version : Thanks to @Bi Rico for suggesting an improvement here as we can use np.ogrid
to perform those operations in a bit more concise manner, like so -
m,n,r = volume.shape x,y,z = np.ogrid[0:m,0:n,0:r]-roi mask = (x**2+y**2+z**2) < radius**2
Runtime test
Function definitions -
def vectorized_app1(volume, roi, radius): m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] return X**2 + Y**2 + Z**2 < radius**2 def vectorized_app1_improved(volume, roi, radius): m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2') def vectorized_app2(volume, roi, radius): m,n,r = volume.shape vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \ ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2) return vals < radius**2 def vectorized_app2_simplified(volume, roi, radius): m,n,r = volume.shape x,y,z = np.ogrid[0:m,0:n,0:r]-roi return (x**2+y**2+z**2) < radius**2
Timings -
In [106]: # Setup input arrays ...: volume = np.random.rand(90,110,100) # Half of original input sizes ...: roi = np.random.rand(3) ...: radius = 3.4 ...: In [107]: %timeit _make_mask(volume, roi, radius) 1 loops, best of 3: 41.4 s per loop In [108]: %timeit vectorized_app1(volume, roi, radius) 10 loops, best of 3: 62.3 ms per loop In [109]: %timeit vectorized_app1_improved(volume, roi, radius) 10 loops, best of 3: 47 ms per loop In [110]: %timeit vectorized_app2(volume, roi, radius) 100 loops, best of 3: 4.26 ms per loop In [139]: %timeit vectorized_app2_simplified(volume, roi, radius) 100 loops, best of 3: 4.36 ms per loop
So, as always broadcasting
showing its magic for a crazy almost 10,000x
speedup over the original code and more than 10x
better than creating meshes by using on-the-fly broadcasted operations!
Say you first build an xyzy
array:
import itertools xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))]
Now, using numpy.linalg.norm
,
np.linalg.norm(xyz - roi, axis=1) < radius
checks whether the distance for each tuple from roi
is smaller than radius.
Finally, just reshape
the result to the dimensions you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With