I have a large sparse matrix and I want to get the maximum value for each row. In numpy, I can call numpy.max(mat, axis=1), but I can not find similar function for scipy sparse matrix. Is there any efficient way to get the max of each row for a large sparse matrix?
If your matrix, lets call it a
, is stored in CSR format, then a.data
has all the non-zero entries ordered by rows, and a.indptr
has the index of the first element of every row. You can use this to calculate what you are after as follows:
def sparse_max_row(csr_mat):
ret = np.maximum.reduceat(csr_mat.data, csr_mat.indptr[:-1])
ret[np.diff(csr_mat.indptr) == 0] = 0
return ret
I just came across this same problem. Jaime's solution breaks if any of the rows in the matrix are completely empty. Here's a workaround:
def sparse_max_row(csr_mat):
ret = np.zeros(csr_mat.shape[0])
ret[np.diff(csr_mat.indptr) != 0] = np.maximum.reduceat(csr_mat.data,csr_mat.indptr[:-1][np.diff(csr_mat.indptr)>0])
return ret
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With