I have the following code which iterates through 2d numpy array named "m". It works extremely slow. How can I transform this code using numpy functions so that I avoid using the for loops?
pairs = []
for i in range(size):
for j in range(size):
if(i >= j):
continue
if(m[i][j] + m[j][i] >= 0.75):
pairs.append([i, j, m[i][j] + m[j][i]])
You can use vectorised approach using NumPy. The idea is:
m
and then create m+m.T
which is equivalent to m[i][j] + m[j][i]
where m.T
is the matrix transpose and call it summ
np.triu
(summ)
returns the upper triangular part of the matrix (This is equivalent to ignoring the lower part by using continue
in your code). This avoids explicit if(i >= j):
in your code. Here you have to use k=1
to exclude the diagonal elements. By default, k=0
which includes the diagonal elements as well. np.argwhere
where the sum m+m.T
is greater than equal to 0.75Verifiable example (using a small 3x3 random dataset)
import numpy as np
np.random.seed(0)
m = np.random.rand(3,3)
summ = m + m.T
index = np.argwhere(np.triu(summ, k=1)>=0.75)
pairs = [(x,y, summ[x,y]) for x,y in index]
print (pairs)
# # [(0, 1, 1.2600725493693163), (0, 2, 1.0403505873343364), (1, 2, 1.537667113848736)]
Further performance improvement
I just worked out an even faster approach to generate the final pairs
list avoiding explicit for loops as
pairs = list(zip(index[:, 0], index[:, 1], summ[index[:,0], index[:,1]]))
One way to optimize your code is to avoid comparison if (i >= j)
. To traverse only the lower triangle of the array without that comparison, you have to make the inner loop start with the value of i
of the outermost loop. That way, you avoid size x size
if
comparisons.
import numpy as np
size = 5000
m = np.random.rand(size, size)
pairs = []
for i in range(size):
for j in range(i , size):
if(m[i][j] + m[j][i] >= 0.75):
pairs.append([i, j, m[i][j] + m[j][i]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With