I have a large scipy.sparse.csc_matrix
and would like to normalize it. That is subtract the column mean from each element and divide by the column standard deviation (std)i.
scipy.sparse.csc_matrix
has a .mean()
but is there an efficient way to compute the variance or std?
The variance is the average of the squared deviations from the mean, i.e., var = mean(x) , where x = abs(a - a. mean())**2 . The mean is typically calculated as x. sum() / N , where N = len(x) .
The function csr_matrix() is used to create a sparse matrix of compressed sparse row format whereas csc_matrix() is used to create a sparse matrix of compressed sparse column format.
Problem Statement: The Principal Component Analysis does not apply to a Sparse matrix. So what approach should be taken considering reduction in cost and memory usage. A sparse matrix is a matrix which contains higher number of zero value components than non-zero value components.
Python's SciPy provides tools for creating sparse matrices using multiple data structures, as well as tools for converting a dense matrix to a sparse matrix. The sparse matrix representation outputs the row-column tuple where the matrix contains non-zero values along with those values.
You can calculate the variance yourself using the mean, with the following formula:
E[X^2] - (E[X])^2
E[X]
stands for the mean. So to calculate E[X^2]
you would have to square the csc_matrix
and then use the mean
function. To get (E[X])^2
you simply need to square the result of the mean
function obtained using the normal input.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With