Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scipy Sparse Matrix - Dense Vector Multiplication Performance - Blocks vs Large Matrix

I have a number of scipy sparse matrices (currently in CSR format) that I need to multiply with a dense numpy 1D vector. The vector is called G:

print G.shape, G.dtype
(2097152,) complex64

Each sparse matrix has shape (16384,2097152) and is very sparse. The density is approximately 4.0e-6. I have a list of 100 of these sparse matrices called spmats.

I can easily multiply each matrix with G like so:

res = [spmat.dot(G) for spmat in spmats]

This results in a list of dense vectors of shape (16384,) as expected.

My application is fairly perfomance-critical, and so I tried an alternate, which is to first concatenate all the sparse matrices into one large sparce matrix, and then use only one call of dot() like so:

import scipy.sparse as sp
SPMAT = sp.vstack(spmats, format='csr')
RES = SPMAT.dot(G)

This results in one long vector RES which has shape (1638400,), and is a concatenated version of all the result vectors in res above, as expected. I've checked that the results are identical.

Maybe I'm completely wrong, but I expected that the second case should be faster than the first, since there are far fewer numpy calls, memory allocations, creation of python objects, python loops, etc. I don't care about the time required to concatenate the sparse matrices, only the time to compute the result. According to %timeit, however:

%timeit res = [spmat.dot(G) for spmat in spmats]
10 loops, best of 3: 91.5 ms per loop
%timeit RES = SPMAT.dot(G)
1 loops, best of 3: 389 ms per loop

I've checked that I'm not running out of memory in either operation, and nothing fishy seems to be going on. Am I crazy, or is this really strange? Does this mean that all sparse matrix-vector products should be done in blocks, a few rows at a time, to make them faster? As far as I understand, sparse matrix multiplication time with a dense vector should be linear in the number of nonzero elements, which is unchanged in the two cases above. What could be making such a difference?

I'm running on a single core linux machine with 4GB ram, using EPD7.3

EDIT:

Here is a small example that reproduces the problem for me:

import scipy.sparse as sp
import numpy as n

G = n.random.rand(128**3) + 1.0j*n.random.rand(128**3)

spmats = [sp.rand (128**2, 128**3, density = 4e-6, format = 'csr', dtype=float64) for i in range(100)]
SPMAT = sp.vstack(spmats, format='csr')

%timeit res = [spmat.dot(G) for spmat in spmats]
%timeit RES = SPMAT.dot(G)

I get:

1 loops, best of 3: 704 ms per loop
1 loops, best of 3: 1.34 s per loop

The performance difference in this case is not as large as with my own sparse matrices that have some structure (probably because of caching) but it's still worse to concatenate the matrices.

I have tried with both scipy 10.1 and 12.0.

like image 502
ali_p Avatar asked May 25 '13 06:05

ali_p


1 Answers

I haven't found a reason for the strange behaviour mentioned in the question, however I have found a way to speed up my computation significantly which may be useful for other people.

Because in my particular case I'm computing the product of a float32 sparse matrix and a complex64 dense vector, I can multiply the real and imaginary components separately. This provides me with a 4x speedup.

This takes 2.35s with SPMAT.shape == (16384000, 2097152):

RES = SPMAT.dot(G)

While this takes only 541ms:

RES = n.zeros((SPMAT.shape[0],),dtype=complex64)
RES.real = SPMAT.dot(G.real); RES.imag = SPMAT.dot(G.imag)

And the result is identical. I think maybe the n.zeros preallocation may be not neccesary but I don't know how else to do this.

like image 141
ali_p Avatar answered Nov 15 '22 10:11

ali_p