I would like to multiply two large sparse matrices. The first is 150,000x300,000 and the second is 300,000x300,000. The first matrix has about 1,000,000 non-zero items and the second matrix has about 20,000,000 non-zero items. Is there a straightforward way to get the product of these matrices?
I'm currently storing the matrices in csr or csc format and trying matrix_a * matrix_b
. This gives the error ValueError: array is too big
.
I'm guessing I could store the separate matrices on disk with pytables, pull them apart into smaller blocks, and construct the final matrix product from the products of many blocks. But I'm hoping for something relatively simple to implement.
EDIT: I'm hoping for a solution that works for arbitrarily large sparse matrices, while hiding (or avoiding) the bookkeeping involved in moving individual blocks back and forth between memory and disk.
We use the multiply() method provided in both csc_matrix and csr_matrix classes to multiply two sparse matrices. We can multiply two matrices of same format( both matrices are csc or csr format) and also of different formats ( one matrix is csc and other is csr format).
To Multiply the matrices, we first calculate transpose of the second matrix to simplify our comparisons and maintain the sorted order. So, the resultant matrix is obtained by traversing through the entire length of both matrices and summing the appropriate multiplied values.
Multiplication can be done using nested loops. Following program has two matrices x and y each with 3 rows and 3 columns. The resultant z matrix will also have 3X3 structure. Element of each row of first matrix is multiplied by corresponding element in column of second matrix.
Step1: input two matrix. Step 2: nested for loops to iterate through each row and each column. Step 3: take one resultant matrix which is initially contains all 0. Then we multiply each row elements of first matrix with each elements of second matrix, then add all multiplied value.
Strange, because the following worked for me:
import scipy.sparse
mat1 = scipy.sparse.rand(150e3, 300e3, density=1e6/150e3/300e3)
mat2 = scipy.sparse.rand(300e3, 300e3, density=20e6/150e3/300e3)
cmat1 = scipy.sparse.csc_matrix(mat1)
cmat2 = scipy.sparse.csc_matrix(mat2)
res = cmat1 * cmat2
I'm using the latest scipy. And the amount of RAM used by python was ~3GB
So maybe your matrices are such that their product is not very sparse ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With