Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I compute the variance of a column of a sparse matrix in Scipy?

I have a large scipy.sparse.csc_matrix and would like to normalize it. That is subtract the column mean from each element and divide by the column standard deviation (std)i.

scipy.sparse.csc_matrix has a .mean() but is there an efficient way to compute the variance or std?

like image 715
nickponline Avatar asked Aug 29 '12 01:08

nickponline


People also ask

How is variance calculated SciPy?

The variance is the average of the squared deviations from the mean, i.e., var = mean(x) , where x = abs(a - a. mean())**2 . The mean is typically calculated as x. sum() / N , where N = len(x) .

What does SciPy sparse Csr_matrix do?

The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.

Does PCA work on sparse matrices?

Problem Statement: The Principal Component Analysis does not apply to a Sparse matrix. So what approach should be taken considering reduction in cost and memory usage. A sparse matrix is a matrix which contains higher number of zero value components than non-zero value components.

How does SciPy sparse work?

Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.


1 Answers

You can calculate the variance yourself using the mean, with the following formula:

E[X^2] - (E[X])^2

E[X] stands for the mean. So to calculate E[X^2] you would have to square the csc_matrix and then use the mean function. To get (E[X])^2 you simply need to square the result of the mean function obtained using the normal input.

like image 127
Sicco Avatar answered Sep 21 '22 07:09

Sicco