Vectorization to achieve performance

Tags:

I want to avoid using for loop in the following code to achieve performance. Is vectorization suitable for this kind of problem?

a = np.array([[0,1,2,3,4],
              [5,6,7,8,9],
              [0,1,2,3,4],
              [5,6,7,8,9],
              [0,1,2,3,4]],dtype= np.float32)

temp_a = np.copy(a)

for i in range(1,a.shape[0]-1):
    for j in range(1,a.shape[1]-1):
        if a[i,j] > 3:
            temp_a[i+1,j] += a[i,j] / 5.
            temp_a[i-1,j] += a[i,j] / 5.
            temp_a[i,j+1] += a[i,j] / 5.
            temp_a[i,j-1] += a[i,j] / 5.
            temp_a[i,j]   -= a[i,j] * 4. / 5.
a = np.copy(temp_a)

475

asked Jan 02 '18 05:01

2 Answers

You are basically doing convolution, with some special treatment for borders.

Try the following:

from scipy.signal import convolve2d


# define your filter
f = np.array([[0.0, 0.2, 0.0],
              [0.2,-0.8, 0.2],
              [0.0, 0.2, 0.0]])

# select parts of 'a' to be used for convolution
b = (a * (a > 3))[1:-1, 1:-1]

# convolve, padding with zeros ('same' mode)
c = convolve2d(b, f, mode='same')

# add the convolved result to 'a', excluding borders
a[1:-1, 1:-1] += c

# treat the special cases of the borders
a[0, 1:-1] += .2 * b[0, :]
a[-1, 1:-1] += .2 * b[-1, :]
a[1:-1, 0] += .2 * b[:, 0]
a[1:-1, -1] += .2 * b[:, -1]

It gives the following result, which is the same as you nested loops.

[[  0.    2.2   3.4   4.6   4. ]
 [  6.2   2.6   4.2   3.   10.6]
 [  0.    3.4   4.8   6.2   4. ]
 [  6.2   2.6   4.2   3.   10.6]
 [  0.    2.2   3.4   4.6   4. ]]

163

answered Sep 20 '22 13:09

My trail uses 3 filters, rot90, np.where, np.sum, and np.multiply. I am not sure which way to benchmark is more reasonable. If you do not take into account the time to create filters, it is roughly 4 times faster.

# Each filter basically does what `op` tries to achieve in a loop

filter1 = np.array([[0, 1 ,0, 0, 0],
                  [1, -4, 1, 0, 0],
                  [0, 1, 0, 0, 0],
                  [0, 0, 0, 0, 0],
                  [0, 0, 0, 0, 0]]) /5.

filter2 = np.array([[0, 0 ,1, 0, 0],
                  [0, 1, -4, 1, 0],
                  [0, 0, 1, 0, 0],
                  [0, 0, 0, 0, 0],
                  [0, 0, 0, 0, 0]]) /5.
filter3 = np.array([[0, 0 ,0, 0, 0],
                  [0, 0, 1, 0, 0],
                  [0, 1, -4, 1, 0],
                  [0, 0, 1, 0, 0],
                  [0, 0, 0, 0, 0]]) /5.
# only loop over the center of the matrix, a
center = np.array([[0, 0, 0, 0, 0],
                   [0, 1, 1, 1, 0],
                   [0, 1, 1, 1, 0],
                   [0, 1, 1, 1, 0],
                   [0, 0, 0, 0, 0]])

filter1 and filter2 can be rotated to represent 4 filters individually.

filter1_90_rot = np.rot90(filter1, k=1)
filter1_180_rot = np.rot90(filter1, k=2)
filter1_270_rot = np.rot90(filter1, k=3)
filter2_90_rot = np.rot90(filter2, k=1)
filter2_180_rot = np.rot90(filter2, k=2)
filter2_270_rot = np.rot90(filter2, k=3)

# Based on different index from `a` return different filter

filter_dict = {
             (1,1): filter1,
             (3,1): filter1_90_rot,
             (3,3): filter1_180_rot,
             (1,3): filter1_270_rot,
             (1,2): filter2,
             (2,1): filter2_90_rot,
             (3,2): filter2_180_rot,
             (2,3): filter2_270_rot,
             (2,2): filter3
            }

Main function

def get_new_a(a):
    x, y = np.where(((a > 3) * center) > 0) # find pairs that match the condition
    return a + np.sum(np.multiply(filter_dict[i, j], a[i,j])
                      for (i, j) in zip(x,y))

Note: There seem to be some numerical errors such that np.equal() would mostly return False between my result and OP's while np.close() would return true.

Timing results

def op():
    temp_a = np.copy(a)

    for i in range(1,a.shape[0]-1):
        for j in range(1,a.shape[1]-1):
            if a[i,j] > 3:
                temp_a[i+1,j] += a[i,j] / 5.
                temp_a[i-1,j] += a[i,j] / 5.
                temp_a[i,j+1] += a[i,j] / 5.
                temp_a[i,j-1] += a[i,j] / 5.
                temp_a[i,j]   -= a[i,j] * 4. / 5.
    a2 = np.copy(temp_a)   

%timeit op()
167 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit get_new_a(a):
37.2 µs ± 2.68 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Note again, we ignore the time to create filter as I think it would be a one time thing. If you do want to include the time to create filters, it is roughly two times faster. You might think it is not fair becasue op's method contains two np.copy. The bottleneck of op's method, I think, is the for loop.

Reference:

numpy.multiply do a elementwise multiplication between two matrix.
np.rot90 does rotation for us. k is a parameter that you can decide how many times to rotate. np.isclose can use this function to check whether two matrices are close within some error that you can define.

answered Sep 23 '22 13:09

Tai

Related questions
                            
                                Auto-Machine-Learning python equivalent code
                            
                                Eliminate null key from hashmap which is present inside an arraylist
                            
                                django test urls status code
                            
                                Selecting the best combination of variables for regression model based on reg score
                            
                                I can't access object inside array even though it exists [duplicate]
                            
                                Firebase user returns null metadata for already signed up users
                            
                                Cucumber Ordered Tagged Hooks
                            
                                How to play gif or video depending on internet speed?
                            
                                How to insert null field using Room Android
                            
                                Is there a way to fix these Disqus "preload" warnings?
                            
                                importing a package that doesn't exist
                            
                                AWS SAM Local and docker-lambda: keep getting Unable to import module 'lambda_function': No module named 'lambda_function'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Vectorization to achieve performance

Tags:

Behzad Jamali

People also ask

2 Answers

grovina

Timing results

Reference:

Tai

Recent Activity

Donate For Us