I am trying to implement the following equation using scipy's sparse package:
W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ...
where x & y are a nxm csc_matrix. Basically I'm trying to multiply each col of x by each col of y and sum the resulting nxn matrices together. I then want to make all non-zero elements 1.
This is my current implementation:
c = sparse.csc_matrix((n, n))
for i in xrange(0,m):
tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T
minimum(tmp.data,ones_like(tmp.data),tmp.data)
maximum(tmp.data,ones_like(tmp.data),tmp.data)
c = c + tmp
This implementation has the following problems:
Memory usage seems to explode. As I understand it, memory should only increase as c becomes less sparse, but I am seeing that the loop starts eating up >20GB of memory with a n=10,000, m=100,000 (each row of x & y only has around 60 non-zero elements).
I'm using a python loop which is not very efficient.
My question: Is there a better way to do this? Controlling memory usage is my first concern, but it would be great to make it faster!
Thank you!
Sparse matrix is a matrix which contains very few non-zero elements. When a sparse matrix is represented with a 2-dimensional array, we waste a lot of space to represent that matrix. For example, consider a matrix of size 100 X 100 containing only 10 non-zero elements.
Two elements with the same row values are further sorted according to their column values. Now to Add the matrices, we simply traverse through both matrices element by element and insert the smaller element (one with smaller row and col value) into the resultant matrix.
Matrices (HSM Ch.2.4.1) Stored in a C++ 2 dimensional array. A sparse matrix object is a set of triples <row,column,value>, where each row-column combination is unique. Operations include input, output, transpose, add, multiply.
The product of two sparse matrices is sparse; The inverse of a sparse matrix is sparse; The product of two symmetric matrices is symmetric.
Note that a sum of outer products in the manner you describe is simply the same as multiplying two matrices together. In other words,
sum_i X[:,i]*Y[:,i].T == X*Y.T
So just multiply the matrices together.
Z = X*Y.T
For n=10000 and m=100000 and where each column has one nonzero element in both X and Y, it computes almost instantly on my laptop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With