I'm trying to calculate the mean of non-zero values in each row of a sparse row matrix. Using the matrix's mean method doesn't do it:
>>> from scipy.sparse import csr_matrix
>>> a = csr_matrix([[0, 0, 2], [1, 3, 8]])
>>> a.mean(axis=1)
matrix([[ 0.66666667],
[ 4. ]])
The following works but is slow for large matrices:
>>> import numpy as np
>>> b = np.zeros(a.shape[0])
>>> for i in range(a.shape[0]):
... b[i] = a.getrow(i).data.mean()
...
>>> b
array([ 2., 4.])
Could anyone please tell me if there is a faster method?
Location and Count of Nonzeros Create a 10-by-10 random sparse matrix with 7% density of nonzeros. A = sprand(10,10,0.07); Use nonzeros to find the values of the nonzero elements. Use nnz to count the number of nonzeros.
Memory Management The sparse attribute allows MATLAB to: Store only the nonzero elements of the matrix, together with their indices. Reduce computation time by eliminating operations on zero elements.
Density of Sparse Matrix The result indicates that only about 2% of the elements in the matrix are nonzero.
nonzero() function is used to Compute the indices of the elements that are non-zero. It returns a tuple of arrays, one for each dimension of arr, containing the indices of the non-zero elements in that dimension. The corresponding non-zero values in the array can be obtained with arr[nonzero(arr)] .
With a CSR format matrix, you can do this even more easily:
sums = a.sum(axis=1).A1
counts = np.diff(a.indptr)
averages = sums / counts
Row-sums are directly supported, and the structure of the CSR format means that the difference between successive values in the indptr
array correspond exactly to the number of nonzero elements in each row.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With