Why does the result of scipy.sparse.csc_matrix.sum() change its type to numpy matrix?

Tags:

I want to generate a large sparse matrix and sum it but I encounter MemoryError a lot. So I tried the operation via scipy.sparse.csc_matrix.sum instead but found that the type of data changed back to a numpy matrix after taking the sum.

window = 10    
np.random.seed = 0
mat = sparse.csc_matrix(np.random.rand(100, 120)>0.5, dtype='d')
print type(mat)
>>> <class 'scipy.sparse.csc.csc_matrix'>

mat_head = mat[:,0:window].sum(axis=1)
print type(mat_head)
>>> <class 'numpy.matrixlib.defmatrix.matrix'>

So I generated mat as zeros matrix just to test the result when mat_head is all zeros.

mat = sparse.csc_matrix((100,120))
print type(mat)
>>> <class 'scipy.sparse.csc.csc_matrix'>
mat_head = mat.sum(axis=1)
print type(mat_head)
>>> <class 'numpy.matrixlib.defmatrix.matrix'>
print np.count_nonzero(mat_head)
>>> 0

Why does this happen? So sum via scipy.sparse is not benefited for preserving memory than numpy as they change the data type back anyway?

970

asked Jun 06 '18 06:06

Jan

2 Answers

As far as it is possible to give a hard reason for what is essentially a design choice I'd make the following argument:

The csr and csc formats are designed for sparse but not extremely sparse matrices. In particular, for an nxn matrix that has significantly fewer than n nonzeros these formats are rather wasteful because on top of the data and indices they carry a field indptr (delineating rows or columns) of size n+1.

Therefore assuming a properly utilized csc or csr matrix it is reasonable to expect row or column sums not to be sparse and the corresponding method should return a dense vector.

answered Nov 15 '22 05:11

Paul Panzer

I'm aware that your question of "why" mostly targets the motivation behind the design decision, but anyway I tracked down how the result of csc_matrix.sum(axis=1) actually becomes a numpy matrix.

The csc_matrix class inherits from the _cs_matrix class which inherits from the _data_matrix class which inherits from the spmatrix base class. This last one implements .sum(ax) as

if axis == 0:
    # sum over columns
    ret = np.asmatrix(np.ones(
        (1, m), dtype=res_dtype)) * self
else:
    # sum over rows
    ret = self * np.asmatrix(np.ones((n, 1), dtype=res_dtype))

In other words, as also noted in a comment, the column/row sums are computed by multiplying with a dense row or column matrix of ones, respectively. The result of this operation will be a dense matrix which you see on output.

While some of the subclasses override their .sum() method, as far as I could tell this only happens for the axis=None case, so the result which you see can be attributed to the above block of code.

answered Nov 15 '22 06:11

Andras Deak -- Слава Україні

Related questions
                            
                                Installing GPU support for LightGBM on Google Collab
                            
                                How to calculate the cumulative distribution function in python without using scipy
                            
                                How can I search for specific keys in this nested dictionary in Python?
                            
                                Priority in grammar using Lark
                            
                                Python Pandas - Add values from one dataframe to another by matching labels to columns
                            
                                customized transformerMixin with data labels in sklearn
                            
                                Run two python files at the same time
                            
                                Saving a dataframe result value to a string variable?
                            
                                Show categorical x-axis values when making line plot from pandas Series in matplotlib
                            
                                How can I convert nested dictionary to defaultdict?
                            
                                Python - Split array into multiple arrays
                            
                                Unpack list of lists into list [duplicate]
                            
                                Unable to runserver with docker-compose up
                            
                                "Expand" pandas dataframe by values in column
                            
                                How to crop a bounding box out of an image
                            
                                EOFError: marshal data too short
                            
                                Python Plotly - Multiple dropdown plots, each of which have subplots
                            
                                How to display 16-bit 4096 intensity image in Python openCV?
                            
                                How to divide each column of pandas Dataframe by a Series?
                            
                                psycopg2.extras.DictCursor not returning dict in postgres

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does the result of scipy.sparse.csc_matrix.sum() change its type to numpy matrix?

Tags:

python

numpy

python-2.7

scipy

sparse-matrix

Jan

People also ask

2 Answers

Paul Panzer

Andras Deak -- Слава Україні

Recent Activity

Donate For Us