How do you get the mean and std of a column in a csr_matrix?

Tags:

I have a sparse 988x1 vector (a column in a csr_matrix) created through scipy.sparse. Is there a way to gets its mean and standard deviation without having to convert the sparse matrix to a dense one?

numpy.mean seems to only work for dense vectors.

773

asked Mar 29 '13 10:03

IssamLaradji

1 Answers

Since you are performing column slicing, it may be better to store the matrix using CSC rather than CSR. But that would depend on what else you are doing with the matrix.

To calculate the mean of a column in a CSC matrix you can use the mean() function of the matrix.

To calculate the standard deviation efficiently is going to involve just a bit more effort. First of all, suppose you get your sparse column like this:

col = A.getcol(colindex)

Then calculate the variance like so:

N = col.shape[0]
sqr = col.copy() # take a copy of the col
sqr.data **= 2 # square the data, i.e. just the non-zero data
variance = sqr.sum()/N - col.mean()**2

155

answered Sep 28 '22 01:09

David Heffernan

Related questions
                            
                                Python splines or other interpolations that work with time on x-axis?
                            
                                win32com.client.Dispatch("Outlook.Application") error pywintypes.com_error: (-2147221005, 'Invalid class string', None, None)
                            
                                Numpy and Scipy installation on windows
                            
                                Python ctypes: How to modify an existing char* array
                            
                                what is the fastest way to initialise a scipy.sparse matrix with numpy.NaN?
                            
                                unsupported operand type(s) for *: 'numpy.ndarray' and 'numpy.float64'
                            
                                Google App Engine Local (Development) IPython Shell
                            
                                Is there a way to speed up the authenticate function in django?
                            
                                Python 2.7 or 3.3 for learning Django
                            
                                Numpy __getitem__ delayed evaluation and a[-1:] not the same as a[slice(-1, None, none)]
                            
                                Is there a python module to solve/integrate a system of stochastic differential equations?
                            
                                Display table of objects django
                            
                                How to write big set of data to xls file?
                            
                                python re.X vs automagic line continuation
                            
                                setuptools, easy_install, and a custom pypi server
                            
                                Topological sort python
                            
                                NumPy genfromtxt: using filling_missing correctly
                            
                                Accessing 802.11 Wireless Management Frames from Python
                            
                                Can I make Python throw exception when equal comparing different data types?
                            
                                Add quotes around each string in a list in jinja2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do you get the mean and std of a column in a csr_matrix?

Tags:

python

numpy

scipy

sparse-matrix

IssamLaradji

People also ask

1 Answers

David Heffernan

Recent Activity

Donate For Us