Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you get the mean and std of a column in a csr_matrix?

I have a sparse 988x1 vector (a column in a csr_matrix) created through scipy.sparse. Is there a way to gets its mean and standard deviation without having to convert the sparse matrix to a dense one?

numpy.mean seems to only work for dense vectors.

like image 773
IssamLaradji Avatar asked Mar 29 '13 10:03

IssamLaradji


People also ask

What does Csr_matrix do in Python?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

What is Indptr?

1. In words, indptr (index pointer) represents the indices for partitioning the data and indices (column indices). To fill in the matrix with the (nonzero) data, knowing the column indices is clearly not enough.

How do you find the sparse matrix?

To check whether a matrix is a sparse matrix, we only need to check the total number of elements that are equal to zero. If this count is more than (m * n)/2, we return true.

What is Lil_matrix?

lil_matrix((M, N), [dtype]) to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype='d'. Notes. Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.


1 Answers

Since you are performing column slicing, it may be better to store the matrix using CSC rather than CSR. But that would depend on what else you are doing with the matrix.

To calculate the mean of a column in a CSC matrix you can use the mean() function of the matrix.

To calculate the standard deviation efficiently is going to involve just a bit more effort. First of all, suppose you get your sparse column like this:

col = A.getcol(colindex)

Then calculate the variance like so:

N = col.shape[0]
sqr = col.copy() # take a copy of the col
sqr.data **= 2 # square the data, i.e. just the non-zero data
variance = sqr.sum()/N - col.mean()**2
like image 155
David Heffernan Avatar answered Sep 28 '22 01:09

David Heffernan