Consider two ndarrays
of length n
, arr1
and arr2
. I'm computing the following sum of products, and doing it num_runs
times to benchmark:
import numpy as np
import time
num_runs = 1000
n = 100
arr1 = np.random.rand(n)
arr2 = np.random.rand(n)
start_comp = time.clock()
for r in xrange(num_runs):
sum_prods = np.sum( [arr1[i]*arr2[j] for i in xrange(n)
for j in xrange(i+1, n)] )
print "total time for comprehension = ", time.clock() - start_comp
start_loop = time.clock()
for r in xrange(num_runs):
sum_prod = 0.0
for i in xrange(n):
for j in xrange(i+1, n):
sum_prod += arr1[i]*arr2[j]
print "total time for loop = ", time.clock() - start_loop
The output is
total time for comprehension = 3.23097066953
total time for comprehension = 3.9045544426
so using list comprehension appears faster.
Is there a much more efficient implementation, using Numpy routines perhaps, to calculate such a sum of products?
Rearrange the operation into an O(n) runtime algorithm instead of O(n^2), and take advantage of NumPy for the products and sums:
# arr1_weights[i] is the sum of all terms arr1[i] gets multiplied by in the
# original version
arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
sum_prods = arr1.dot(arr1_weights)
Timing shows this to be about 200 times faster than the list comprehension for n == 100
.
In [21]: %%timeit
....: np.sum([arr1[i] * arr2[j] for i in range(n) for j in range(i+1, n)])
....:
100 loops, best of 3: 5.13 ms per loop
In [22]: %%timeit
....: arr1_weights = arr2[::-1].cumsum()[::-1] - arr2
....: sum_prods = arr1.dot(arr1_weights)
....:
10000 loops, best of 3: 22.8 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With