Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Iterate through numpy array testing multiple elements efficiently

Tags:

python

numpy

I have the following code which iterates through 2d numpy array named "m". It works extremely slow. How can I transform this code using numpy functions so that I avoid using the for loops?

pairs = []
for i in range(size):
    for j in range(size):
        if(i >= j):
            continue
        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])
like image 966
nota Avatar asked Sep 18 '25 04:09

nota


2 Answers

You can use vectorised approach using NumPy. The idea is:

  • First initialize a matrix m and then create m+m.T which is equivalent to m[i][j] + m[j][i] where m.T is the matrix transpose and call it summ
  • np.triu(summ) returns the upper triangular part of the matrix (This is equivalent to ignoring the lower part by using continue in your code). This avoids explicit if(i >= j): in your code. Here you have to use k=1 to exclude the diagonal elements. By default, k=0 which includes the diagonal elements as well.
  • Then you get the indices of points using np.argwhere where the sum m+m.T is greater than equal to 0.75
  • Then you store those indices and the corresponding values in a list for later processing/printing purposes.

Verifiable example (using a small 3x3 random dataset)

import numpy as np

np.random.seed(0)
m = np.random.rand(3,3)
summ = m + m.T

index = np.argwhere(np.triu(summ, k=1)>=0.75)

pairs = [(x,y, summ[x,y]) for x,y in index]
print (pairs)
# # [(0, 1, 1.2600725493693163), (0, 2, 1.0403505873343364), (1, 2, 1.537667113848736)]

Further performance improvement

I just worked out an even faster approach to generate the final pairs list avoiding explicit for loops as

pairs = list(zip(index[:, 0], index[:, 1], summ[index[:,0], index[:,1]]))
like image 160
Sheldore Avatar answered Sep 19 '25 16:09

Sheldore


One way to optimize your code is to avoid comparison if (i >= j). To traverse only the lower triangle of the array without that comparison, you have to make the inner loop start with the value of i of the outermost loop. That way, you avoid size x size if comparisons.

import numpy as np
size = 5000
m = np.random.rand(size, size)
pairs = []


for i in range(size):
    for j in range(i , size):

        if(m[i][j] + m[j][i] >= 0.75):
            pairs.append([i, j, m[i][j] + m[j][i]])
like image 29
Eduardo Soares Avatar answered Sep 19 '25 17:09

Eduardo Soares