V is (n,p) numpy array typically dimensions are n~10, p~20000
The code I have now looks like
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
How would I go about rewriting this to avoid the double python for loop?
While Isaac's answer seems promising, as it removes those two nested for loops, you are having to create an intermediate array M
which is n
times the size of your original V
array. Python for loops are not cheap, but memory access ain't free either:
n = 10
p = 20000
V = np.random.rand(n, p)
F = np.random.rand(n, n)
def op_code(V, F):
n, p = V.shape
A = np.zeros(p)
for i in xrange(n):
for j in xrange(i+1):
A += F[i,j] * V[i,:] * V[j,:]
return A
def isaac_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
M = (V.reshape(n, 1, p) * V.reshape(1, n, p)) * F.reshape(n, n, 1)
return M.sum((0, 1))
If you now take both for a test ride:
In [20]: np.allclose(isaac_code(V, F), op_code(V, F))
Out[20]: True
In [21]: %timeit op_code(V, F)
100 loops, best of 3: 3.18 ms per loop
In [22]: %timeit isaac_code(V, F)
10 loops, best of 3: 24.3 ms per loop
So removing the for loops is costing you an 8x slowdown. Not a very good thing... At this point you may even want to consider whether a function taking about 3ms to evaluate requires any further optimization. IN case you do, there's a small improvement which can be had by using np.einsum
:
def einsum_code(V, F):
n, p = V.shape
F = F.copy()
F[np.triu_indices(n, 1)] = 0
return np.einsum('ij,ik,jk->k', F, V, V)
And now:
In [23]: np.allclose(einsum_code(V, F), op_code(V, F))
Out[23]: True
In [24]: %timeit einsum_code(V, F)
100 loops, best of 3: 2.53 ms per loop
So that's roughly a 20% speed up that introduces code that may very well not be as readable as your for loops. I would say not worth it...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With