I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:
def _func(a, b, radius):     "Return 0 if a>b, otherwise return 1"     if distance.euclidean(a, b) < radius:         return 1     else:         return 0  def _make_mask(volume, roi, radius):     mask = numpy.zeros(volume.shape)     for x in range(volume.shape[0]):         for y in range(volume.shape[1]):             for z in range(volume.shape[2]):                 mask[x, y, z] = _func((x, y, z), roi, radius)     return mask   Where volume.shape (182, 218, 200) and roi.shape (3,) are both ndarray types; and radius is an int 
Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”
Nested For LoopsLoops can be nested in Python, as they can with other programming languages. The program first encounters the outer loop, executing its first iteration. This first iteration triggers the inner, nested loop, which then runs to completion.
What is Vectorization ? Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time of code efficiently.
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
Approach #1
Here's a vectorized approach -
m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] mask = X**2 + Y**2 + Z**2 < radius**2   Possible improvement : We can probably speedup the last step with numexpr module -
import numexpr as ne  mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')   Approach #2
We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi on the fly without actually creating the meshes as done earlier with np.mgrid. This would be benefited by the use of broadcasting for efficiency purposes. The implementation would look like this -
m,n,r = volume.shape vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \        ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2) mask = vals < radius**2   Simplified version : Thanks to @Bi Rico for suggesting an improvement here as we can use np.ogrid to perform those operations in a bit more concise manner, like so -
m,n,r = volume.shape     x,y,z = np.ogrid[0:m,0:n,0:r]-roi mask = (x**2+y**2+z**2) < radius**2   Runtime test
Function definitions -
def vectorized_app1(volume, roi, radius):     m,n,r = volume.shape     x,y,z = np.mgrid[0:m,0:n,0:r]     X = x - roi[0]     Y = y - roi[1]     Z = z - roi[2]     return X**2 + Y**2 + Z**2 < radius**2  def vectorized_app1_improved(volume, roi, radius):     m,n,r = volume.shape     x,y,z = np.mgrid[0:m,0:n,0:r]     X = x - roi[0]     Y = y - roi[1]     Z = z - roi[2]     return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')  def vectorized_app2(volume, roi, radius):     m,n,r = volume.shape     vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \            ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)     return vals < radius**2  def vectorized_app2_simplified(volume, roi, radius):     m,n,r = volume.shape         x,y,z = np.ogrid[0:m,0:n,0:r]-roi     return (x**2+y**2+z**2) < radius**2   Timings -
In [106]: # Setup input arrays        ...: volume = np.random.rand(90,110,100) # Half of original input sizes       ...: roi = np.random.rand(3)      ...: radius = 3.4      ...:   In [107]: %timeit _make_mask(volume, roi, radius) 1 loops, best of 3: 41.4 s per loop  In [108]: %timeit vectorized_app1(volume, roi, radius) 10 loops, best of 3: 62.3 ms per loop  In [109]: %timeit vectorized_app1_improved(volume, roi, radius) 10 loops, best of 3: 47 ms per loop  In [110]: %timeit vectorized_app2(volume, roi, radius) 100 loops, best of 3: 4.26 ms per loop  In [139]: %timeit vectorized_app2_simplified(volume, roi, radius) 100 loops, best of 3: 4.36 ms per loop   So, as always broadcasting showing its magic for a crazy almost 10,000x speedup over the original code and more than 10x better than creating meshes by using on-the-fly broadcasted operations!
Say you first build an xyzy array:
import itertools  xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))]   Now, using numpy.linalg.norm, 
np.linalg.norm(xyz - roi, axis=1) < radius   checks whether the distance for each tuple from roi is smaller than radius.
Finally, just reshape the result to the dimensions you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With