Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python vectorizing nested for loops

I'd appreciate some help in finding and understanding a pythonic way to optimize the following array manipulations in nested for loops:

def _func(a, b, radius):     "Return 0 if a>b, otherwise return 1"     if distance.euclidean(a, b) < radius:         return 1     else:         return 0  def _make_mask(volume, roi, radius):     mask = numpy.zeros(volume.shape)     for x in range(volume.shape[0]):         for y in range(volume.shape[1]):             for z in range(volume.shape[2]):                 mask[x, y, z] = _func((x, y, z), roi, radius)     return mask 

Where volume.shape (182, 218, 200) and roi.shape (3,) are both ndarray types; and radius is an int

like image 763
Kambiz Avatar asked Sep 23 '16 18:09

Kambiz


People also ask

Is NumPy vectorize faster than for loop?

Again, some have observed vectorize to be faster than normal for loops, but even the NumPy documentation states: “The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.”

Can you nest for loops in Python?

Nested For LoopsLoops can be nested in Python, as they can with other programming languages. The program first encounters the outer loop, executing its first iteration. This first iteration triggers the inner, nested loop, which then runs to completion.

What does vectorize mean in Python?

What is Vectorization ? Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time of code efficiently.

What is difference between vectorization and loops?

"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.


2 Answers

Approach #1

Here's a vectorized approach -

m,n,r = volume.shape x,y,z = np.mgrid[0:m,0:n,0:r] X = x - roi[0] Y = y - roi[1] Z = z - roi[2] mask = X**2 + Y**2 + Z**2 < radius**2 

Possible improvement : We can probably speedup the last step with numexpr module -

import numexpr as ne  mask = ne.evaluate('X**2 + Y**2 + Z**2 < radius**2') 

Approach #2

We can also gradually build the three ranges corresponding to the shape parameters and perform the subtraction against the three elements of roi on the fly without actually creating the meshes as done earlier with np.mgrid. This would be benefited by the use of broadcasting for efficiency purposes. The implementation would look like this -

m,n,r = volume.shape vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \        ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2) mask = vals < radius**2 

Simplified version : Thanks to @Bi Rico for suggesting an improvement here as we can use np.ogrid to perform those operations in a bit more concise manner, like so -

m,n,r = volume.shape     x,y,z = np.ogrid[0:m,0:n,0:r]-roi mask = (x**2+y**2+z**2) < radius**2 

Runtime test

Function definitions -

def vectorized_app1(volume, roi, radius):     m,n,r = volume.shape     x,y,z = np.mgrid[0:m,0:n,0:r]     X = x - roi[0]     Y = y - roi[1]     Z = z - roi[2]     return X**2 + Y**2 + Z**2 < radius**2  def vectorized_app1_improved(volume, roi, radius):     m,n,r = volume.shape     x,y,z = np.mgrid[0:m,0:n,0:r]     X = x - roi[0]     Y = y - roi[1]     Z = z - roi[2]     return ne.evaluate('X**2 + Y**2 + Z**2 < radius**2')  def vectorized_app2(volume, roi, radius):     m,n,r = volume.shape     vals = ((np.arange(m)-roi[0])**2)[:,None,None] + \            ((np.arange(n)-roi[1])**2)[:,None] + ((np.arange(r)-roi[2])**2)     return vals < radius**2  def vectorized_app2_simplified(volume, roi, radius):     m,n,r = volume.shape         x,y,z = np.ogrid[0:m,0:n,0:r]-roi     return (x**2+y**2+z**2) < radius**2 

Timings -

In [106]: # Setup input arrays        ...: volume = np.random.rand(90,110,100) # Half of original input sizes       ...: roi = np.random.rand(3)      ...: radius = 3.4      ...:   In [107]: %timeit _make_mask(volume, roi, radius) 1 loops, best of 3: 41.4 s per loop  In [108]: %timeit vectorized_app1(volume, roi, radius) 10 loops, best of 3: 62.3 ms per loop  In [109]: %timeit vectorized_app1_improved(volume, roi, radius) 10 loops, best of 3: 47 ms per loop  In [110]: %timeit vectorized_app2(volume, roi, radius) 100 loops, best of 3: 4.26 ms per loop  In [139]: %timeit vectorized_app2_simplified(volume, roi, radius) 100 loops, best of 3: 4.36 ms per loop 

So, as always broadcasting showing its magic for a crazy almost 10,000x speedup over the original code and more than 10x better than creating meshes by using on-the-fly broadcasted operations!

like image 115
Divakar Avatar answered Sep 19 '22 02:09

Divakar


Say you first build an xyzy array:

import itertools  xyz = [np.array(p) for p in itertools.product(range(volume.shape[0]), range(volume.shape[1]), range(volume.shape[2]))] 

Now, using numpy.linalg.norm,

np.linalg.norm(xyz - roi, axis=1) < radius 

checks whether the distance for each tuple from roi is smaller than radius.

Finally, just reshape the result to the dimensions you need.

like image 43
Ami Tavory Avatar answered Sep 20 '22 02:09

Ami Tavory