Iterate through numpy array testing multiple elements efficiently

Question

I have the following code which iterates through 2d numpy array named "m". It works extremely slow. How can I transform this code using numpy functions so that I avoid using the for loops?

pairs = []
for i in range(size):
    for j in range(size):
        if(i >= j):
            continue
        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])

Sheldore · Accepted Answer

You can use vectorised approach using NumPy. The idea is:

First initialize a matrix m and then create m+m.T which is equivalent to m[i][j] + m[j][i] where m.T is the matrix transpose and call it summ
np.triu(summ) returns the upper triangular part of the matrix (This is equivalent to ignoring the lower part by using continue in your code). This avoids explicit if(i >= j): in your code. Here you have to use k=1 to exclude the diagonal elements. By default, k=0 which includes the diagonal elements as well.
Then you get the indices of points using np.argwhere where the sum m+m.T is greater than equal to 0.75
Then you store those indices and the corresponding values in a list for later processing/printing purposes.

Verifiable example (using a small 3x3 random dataset)

import numpy as np

np.random.seed(0)
m = np.random.rand(3,3)
summ = m + m.T

index = np.argwhere(np.triu(summ, k=1)>=0.75)

pairs = [(x,y, summ[x,y]) for x,y in index]
print (pairs)
# # [(0, 1, 1.2600725493693163), (0, 2, 1.0403505873343364), (1, 2, 1.537667113848736)]

Further performance improvement

I just worked out an even faster approach to generate the final pairs list avoiding explicit for loops as

pairs = list(zip(index[:, 0], index[:, 1], summ[index[:,0], index[:,1]]))

Eduardo Soares · Answer

One way to optimize your code is to avoid comparison if (i >= j). To traverse only the lower triangle of the array without that comparison, you have to make the inner loop start with the value of i of the outermost loop. That way, you avoid size x size if comparisons.

import numpy as np
size = 5000
m = np.random.rand(size, size)
pairs = []


for i in range(size):
    for j in range(i , size):

        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])

Iterate through numpy array testing multiple elements efficiently

Tags:

python

numpy

nota

2 Answers

Sheldore

Eduardo Soares

Recent Activity

Donate For Us

Iterate through numpy array testing multiple elements efficiently

Tags:

python

numpy

nota

2 Answers

Sheldore

Eduardo Soares

Related questions

Recent Activity

Donate For Us