I'm doing a simple sparse matrix exponentiation, a**16
, using scipy-0.17. (Note, not element-wise multiplication). However, on my machines (running Debian stable and Ubuntu LTS) this is ten times slower than using a for loop or doing something silly like a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
. This doesn't make sense, so I assume I'm doing something wrong, but what?
import scipy.sparse
from time import time
a=scipy.sparse.rand(2049,2049,.002)
print ("Trying exponentiation (a**16)")
t=time()
x=a**16
print (repr(x))
print ("Exponentiation took %f seconds\n" % (time()-t))
print ("Trying expansion (a*a*a*...*a*a)")
t=time()
y=a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
print (repr(y))
print ("Expansion took %f seconds\n" % (time()-t))
print ("Trying a for loop (z=z*a)")
t=time()
z=scipy.sparse.eye(2049)
for i in range(16):
z=z*a
print (repr(z))
print ("Looping took %f seconds\n" % (time()-t))
# Sanity check, all approximately the same answer, right?
assert (abs(x-z)>=1e-9).nnz==0
assert (abs(x-y)>=1e-9).nnz==0
@hpaulj's comment about the number of nonzeros is important.
As you compute higher powers of a
, the number of nonzero elements
increases. For sparse matrices, the time to compute a matrix
product increases with the number of nonzero elements.
The algorithm used to compute a**16
is, in effect:
a2 = a*a
a4 = a2*a2
a8 = a4*a4
a16 = a8*a8
Now look at the number of nonzero elements in those matrices
for a = sparse.rand(2049, 2049, 0.002)
:
matrix nnz fraction nnz
a 8396 0.0020
a2 34325 0.0082
a4 521593 0.1240
a8 4029741 0.9598
In the last product, a16 = a8*a8
, the factors are 96% nonzero. Computing
that product using sparse matrix multiplication is slow.
That last step takes up 97% of the time to compute a**16
.
On the other hand, when you compute a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
,
a sparse matrix multiplication is performed 15 times, but one
of the factors in each product always has a small fraction (0.002)
of nonzero values, so each product can be performed reasonably
efficiently.
This suggests that there is probably an optimal strategy for computing the product, balancing the number of multiplications against the sparsity of the factors. For example, computing a2 = a*a; a16 = a2*a2*a2*a2*a2*a2*a2*a2
is faster than a16 = a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
:
In [232]: %timeit a2 = a*a; a4 = a2*a2; a8 = a4*a4; a16 = a8*a8
14.4 s ± 199 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [233]: %timeit a16 = a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a
1.77 s ± 4.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [234]: %timeit a2 = a*a; a16 = a2*a2*a2*a2*a2*a2*a2*a2
1.42 s ± 3.16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Or, since you know the final result will be dense, switch to standard numpy arrays, either from the beginning or at some intermediate step at which dense matrix multiplication is more efficient than sparse matrix multiplication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With