Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to elementwise-multiply a scipy.sparse matrix by a broadcasted dense 1d array?

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np >>> import scipy.sparse as ssp  >>> a = ssp.lil_matrix((5, 3)) >>> a[1, 2] = -1 >>> a[4, 1] = 2 >>> a.todense() matrix([[ 0.,  0.,  0.],         [ 0.,  0., -1.],         [ 0.,  0.,  0.],         [ 0.,  0.,  0.],         [ 0.,  2.,  0.]]) 

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3 >>> d array([ 3.,  3.,  3.]) 

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d array([ 0., -3.,  0.,  0.,  6.]) 

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d array([[ 0.,  0.,  0.],        [ 0.,  0., -3.],        [ 0.,  0.,  0.],        [ 0.,  0.,  0.],        [ 0.,  6.,  0.]]) 

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray()) False 

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

like image 720
ogrisel Avatar asked Jul 14 '10 15:07

ogrisel


People also ask

How do you convert dense to sparse matrix?

A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.

Can XGBoost handle sparse matrix?

XGBoost can take a sparse matrix as input. This allows you to convert categorical variables with high cardinality into a dummy matrix, then build a model without getting an out of memory error.

How do you find the density of a sparsity matrix?

If most of the elements of the matrix have 0 value, then it is called a sparse matrix. The number of zero-valued elements divided by the total number of elements (e.g., m × n for an m × n matrix) is called the sparsity of the matrix (which is equal to 1 minus the density of the matrix).

What does SciPy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.


1 Answers

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3)) >>> d.setdiag(np.ones(3)*3) >>> a*d <5x3 sparse matrix of type '<type 'numpy.float64'>'  with 2 stored elements in Compressed Sparse Row format> >>> (a*d).todense() matrix([[ 0.,  0.,  0.],         [ 0.,  0., -3.],         [ 0.,  0.,  0.],         [ 0.,  0.,  0.],         [ 0.,  6.,  0.]]) 

Hope that helps!

like image 151
mitmatt Avatar answered Sep 29 '22 19:09

mitmatt