How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

Tags:

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np >>> import scipy.sparse as ssp  >>> a = ssp.lil_matrix((5, 3)) >>> a[1, 2] = -1 >>> a[4, 1] = 2 >>> a.todense() matrix([[ 0.,  0.,  0.],         [ 0.,  0., -1.],         [ 0.,  0.,  0.],         [ 0.,  0.,  0.],         [ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3 >>> d array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d array([[ 0.,  0.,  0.],        [ 0.,  0., -3.],        [ 0.,  0.,  0.],        [ 0.,  0.,  0.],        [ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray()) False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

720

asked Jul 14 '10 15:07

ogrisel

1 Answers

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3)) >>> d.setdiag(np.ones(3)*3) >>> a*d <5x3 sparse matrix of type '<type 'numpy.float64'>'  with 2 stored elements in Compressed Sparse Row format> >>> (a*d).todense() matrix([[ 0.,  0.,  0.],         [ 0.,  0., -3.],         [ 0.,  0.,  0.],         [ 0.,  0.,  0.],         [ 0.,  6.,  0.]])

Hope that helps!

151

answered Sep 29 '22 19:09

mitmatt

Related questions
                            
                                How to create conda environment with specific python version?
                            
                                Matplotlib overlapping annotations
                            
                                a good python to exe compiler? [closed]
                            
                                Can I have a Django form without Model
                            
                                Python: Collections.Counter vs defaultdict(int)
                            
                                "ImportError: No module named httplib2" even after installation
                            
                                Python JSON encoder convert NaNs to null instead
                            
                                Total size of serialized results of 16 tasks (1048.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
                            
                                Is there a static constructor or static initializer in Python?
                            
                                Assignment with "or" in python [closed]
                            
                                Python: Make a video using several .png images [closed]
                            
                                How can I accomplish `set_xlim` or `set_ylim` in Bokeh?
                            
                                Tokenize a paragraph into sentence and then into words in NLTK
                            
                                PDB - stepping out of a function
                            
                                Pandas : compute mean or std (standard deviation) over entire dataframe
                            
                                How to compare Enums in Python?
                            
                                how to reset index pandas dataframe after dropna() pandas dataframe
                            
                                Python Packaging: Data files are put properly in tar.gz file but are not installed to virtual environment
                            
                                Rounding a number in Python but keeping ending zeros
                            
                                In python flask, how do you get the path parameters outside of the route function? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

Tags:

python

numpy

scipy

sparse-matrix

ogrisel

People also ask

1 Answers

mitmatt

Recent Activity

Donate For Us