Substitute for numpy broadcasting using scipy.sparse.csc_matrix

Tags:

I have in my code the following expression:

a = (b / x[:, np.newaxis]).sum(axis=1)

where b is an ndarray of shape (M, N), and x is an ndarray of shape (M,). Now, b is actually sparse, so for memory efficiency I would like to substitute in a scipy.sparse.csc_matrix or csr_matrix. However, broadcasting in this way is not implemented (even though division or multiplication is guaranteed to maintain sparsity) (the entries of x are non-zero), and raises a NotImplementedError. Is there a sparse function I'm not aware of that would do what I want? (dot() would sum along the wrong axis.)

282

asked Apr 16 '13 17:04

Juan

2 Answers

If b is in CSC format, then b.data has the non-zero entries of b, and b.indices has the row index of each of the non-zero entries, so you can do your division as:

b.data /= np.take(x, b.indices)

It's hackier than Warren's elegant solution, but it will probably also be faster in most settings:

b = sps.rand(1000, 1000, density=0.01, format='csc')
x = np.random.rand(1000)

def row_divide_col_reduce(b, x):
    data = b.data.copy() / np.take(x, b.indices)
    ret = sps.csc_matrix((data, b.indices.copy(), b.indptr.copy()),
                         shape=b.shape)
    return ret.sum(axis=1)

def row_divide_col_reduce_bis(b, x):
    d = sps.spdiags(1.0/x, 0, len(x), len(x))
    return (d * b).sum(axis=1)

In [2]: %timeit row_divide_col_reduce(b, x)
1000 loops, best of 3: 210 us per loop

In [3]: %timeit row_divide_col_reduce_bis(b, x)
1000 loops, best of 3: 697 us per loop

In [4]: np.allclose(row_divide_col_reduce(b, x),
   ...:             row_divide_col_reduce_bis(b, x))
Out[4]: True

You can cut the time almost in half in the above example if you do the division in-place, i.e.:

def row_divide_col_reduce(b, x):
    b.data /= np.take(x, b.indices)
    return b.sum(axis=1)

In [2]: %timeit row_divide_col_reduce(b, x)
10000 loops, best of 3: 131 us per loop

answered Oct 22 '22 03:10

Jaime

To implement a = (b / x[:, np.newaxis]).sum(axis=1), you can use a = b.sum(axis=1).A1 / x. The A1 attribute returns the 1D ndarray, so the result is a 1D ndarray, not a matrix. This concise expression works because you are both scaling by x and summing along axis 1. For example:

In [190]: b
Out[190]: 
<3x3 sparse matrix of type '<type 'numpy.float64'>'
        with 5 stored elements in Compressed Sparse Row format>

In [191]: b.A
Out[191]: 
array([[ 1.,  0.,  2.],
       [ 0.,  3.,  0.],
       [ 4.,  0.,  5.]])

In [192]: x
Out[192]: array([ 2.,  3.,  4.])

In [193]: b.sum(axis=1).A1 / x
Out[193]: array([ 1.5 ,  1.  ,  2.25])

More generally, if you want to scale the rows of a sparse matrix with a vector x, you could multiply b on the left with a sparse matrix containing 1.0/x on the diagonal. The function scipy.sparse.spdiags can be used to create such a matrix. For example:

In [71]: from scipy.sparse import csc_matrix, spdiags

In [72]: b = csc_matrix([[1,0,2],[0,3,0],[4,0,5]], dtype=np.float64)

In [73]: b.A
Out[73]: 
array([[ 1.,  0.,  2.],
       [ 0.,  3.,  0.],
       [ 4.,  0.,  5.]])

In [74]: x = array([2., 3., 4.])

In [75]: d = spdiags(1.0/x, 0, len(x), len(x))

In [76]: d.A
Out[76]: 
array([[ 0.5       ,  0.        ,  0.        ],
       [ 0.        ,  0.33333333,  0.        ],
       [ 0.        ,  0.        ,  0.25      ]])

In [77]: p = d * b

In [78]: p.A
Out[78]: 
array([[ 0.5 ,  0.  ,  1.  ],
       [ 0.  ,  1.  ,  0.  ],
       [ 1.  ,  0.  ,  1.25]])

In [79]: a = p.sum(axis=1)

In [80]: a
Out[80]: 
matrix([[ 1.5 ],
        [ 1.  ],
        [ 2.25]])

answered Oct 22 '22 05:10

Warren Weckesser

Related questions
                            
                                Best way to delete a django model instance after a certain date
                            
                                Jinja2 and Json
                            
                                Is there something like a depth buffer in matplotlib?
                            
                                Umlauts in regexp matching (via locale?)
                            
                                Slicing numpy array with another array
                            
                                Generating a url with the same GET parameters as the current page in a Django template
                            
                                how to cancel python schedule
                            
                                How to use scrypt to generate hash for password and salt in Python
                            
                                how to get static files in Flask without url_for('static', file_name='xxx')
                            
                                Timeout function using threading in python does not work
                            
                                Sum between pairs of indices in 2d array
                            
                                How to find shortest path in a weighted graph using networkx?
                            
                                Add margin when plots run against the edge of the graph
                            
                                Easily dumping variables from/to namespaces/dictionaries in Python
                            
                                How do I convert a string of hexadecimal values to a list of integers?
                            
                                Compositing two images with python wand
                            
                                Update a dictionary with another dictionary, but only non-None values
                            
                                Python - plotting large number of lines
                            
                                How to initialize nested dictionaries in Python
                            
                                how to append item to a global list from within a procedure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Substitute for numpy broadcasting using scipy.sparse.csc_matrix

Tags:

python

numpy

scipy

sparse-matrix

Juan

People also ask

2 Answers

Jaime

Warren Weckesser

Recent Activity

Donate For Us