I am trying to implement the following equation using scipy's sparse package: <pre class="prettyprint"><code>W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ... </code></pre> where x & y are a nxm csc_matrix. Basically I'm trying to multiply each col of x by each col of y and sum the resulting nxn matrices together. I then want to make all non-zero elements 1. This is my current implementation: <pre class="prettyprint"><code> c = sparse.csc_matrix((n, n)) for i in xrange(0,m): tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T minimum(tmp.data,ones_like(tmp.data),tmp.data) maximum(tmp.data,ones_like(tmp.data),tmp.data) c = c + tmp </code></pre> This implementation has the following problems: <ol> <li>Memory usage seems to explode. As I understand it, memory should only increase as c becomes less sparse, but I am seeing that the loop starts eating up >20GB of memory with a n=10,000, m=100,000 (each row of x & y only has around 60 non-zero elements).</li> <li>I'm using a python loop which is not very efficient.</li> </ol> My question: Is there a better way to do this? Controlling memory usage is my first concern, but it would be great to make it faster! Thank you!

Note that a sum of outer products in the manner you describe is simply the same as multiplying two matrices together. In other words, <pre class="prettyprint"><code>sum_i X[:,i]*Y[:,i].T == X*Y.T </code></pre> So just multiply the matrices together. <pre class="prettyprint"><code>Z = X*Y.T </code></pre> For n=10000 and m=100000 and where each column has one nonzero element in both X and Y, it computes almost instantly on my laptop.

performing sum of outer products on sparse matrices

Tags:

python

matrix

numpy

scipy

sparse-matrix

I am trying to implement the following equation using scipy's sparse package:

W = x[:,1] * y[:,1].T + x[:,2] * y[:,2].T + ...

where x & y are a nxm csc_matrix. Basically I'm trying to multiply each col of x by each col of y and sum the resulting nxn matrices together. I then want to make all non-zero elements 1.

This is my current implementation:

    c = sparse.csc_matrix((n, n))
    for i in xrange(0,m):
        tmp = bam.id2sym_thal[:,i] * bam.id2sym_cort[:,i].T
        minimum(tmp.data,ones_like(tmp.data),tmp.data)
        maximum(tmp.data,ones_like(tmp.data),tmp.data)

        c = c + tmp

This implementation has the following problems:

Memory usage seems to explode. As I understand it, memory should only increase as c becomes less sparse, but I am seeing that the loop starts eating up >20GB of memory with a n=10,000, m=100,000 (each row of x & y only has around 60 non-zero elements).
I'm using a python loop which is not very efficient.

My question: Is there a better way to do this? Controlling memory usage is my first concern, but it would be great to make it faster!

Thank you!

204

asked Aug 04 '11 18:08

RussellM

1 Answers

Note that a sum of outer products in the manner you describe is simply the same as multiplying two matrices together. In other words,

sum_i X[:,i]*Y[:,i].T == X*Y.T

So just multiply the matrices together.

Z = X*Y.T

For n=10000 and m=100000 and where each column has one nonzero element in both X and Y, it computes almost instantly on my laptop.

190

answered Oct 06 '22 09:10

Steve Tjoa

Related questions
                            
                                Fast conversion of C/C++ vector to Numpy array
                            
                                Create client/server with Twisted
                            
                                Obtain SSL certificate from peer without verification using Python
                            
                                How to troubleshoot/bypass locking problem which appears to be GIL related
                            
                                Extracting Blender Original Coordinates (ORCO)
                            
                                Feeling stupid while trying to implement lazy partitioning in Python
                            
                                How to generate the PEM serialization for the public RSA/DSA key
                            
                                apache poi vs python xlrd
                            
                                Regex: Match brackets both greedy and non greedy
                            
                                (python) matplotlib pyplot show() .. blocking or not?
                            
                                NumPy/SciPy: Move mask over Image and check for equality
                            
                                GC Doesn't Delete Circular References in WeakKeyDictionaries?
                            
                                Losing elements in python code while creating a dictionary from a list?
                            
                                Generator Function Performance
                            
                                Parsing pcap files with dpkt (Python)
                            
                                Python lxml iterfind w/ namespace but prefix=None
                            
                                lxml removes spaces and line breaks in <head>
                            
                                Python dependency analyzer library
                            
                                python regex to split on certain patterns with skip patterns
                            
                                How add/change password for RSA priv key using PyCrypto

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With