I have a sparse 988x1 vector (a column in a csr_matrix
) created through scipy.sparse
. Is there a way to gets its mean and standard deviation without having to convert the sparse matrix to a dense one?
numpy.mean
seems to only work for dense vectors.
The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.
1. In words, indptr (index pointer) represents the indices for partitioning the data and indices (column indices). To fill in the matrix with the (nonzero) data, knowing the column indices is clearly not enough.
To check whether a matrix is a sparse matrix, we only need to check the total number of elements that are equal to zero. If this count is more than (m * n)/2, we return true.
lil_matrix((M, N), [dtype]) to construct an empty matrix with shape (M, N) dtype is optional, defaulting to dtype='d'. Notes. Sparse matrices can be used in arithmetic operations: they support addition, subtraction, multiplication, division, and matrix power.
Since you are performing column slicing, it may be better to store the matrix using CSC rather than CSR. But that would depend on what else you are doing with the matrix.
To calculate the mean of a column in a CSC matrix you can use the mean()
function of the matrix.
To calculate the standard deviation efficiently is going to involve just a bit more effort. First of all, suppose you get your sparse column like this:
col = A.getcol(colindex)
Then calculate the variance like so:
N = col.shape[0]
sqr = col.copy() # take a copy of the col
sqr.data **= 2 # square the data, i.e. just the non-zero data
variance = sqr.sum()/N - col.mean()**2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With