Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient way to get the max of each row for large sparse matrix

I have a large sparse matrix and I want to get the maximum value for each row. In numpy, I can call numpy.max(mat, axis=1), but I can not find similar function for scipy sparse matrix. Is there any efficient way to get the max of each row for a large sparse matrix?

like image 880
hanqiang Avatar asked Apr 13 '13 20:04

hanqiang


2 Answers

If your matrix, lets call it a, is stored in CSR format, then a.data has all the non-zero entries ordered by rows, and a.indptr has the index of the first element of every row. You can use this to calculate what you are after as follows:

def sparse_max_row(csr_mat):
    ret = np.maximum.reduceat(csr_mat.data, csr_mat.indptr[:-1])
    ret[np.diff(csr_mat.indptr) == 0] = 0
    return ret
like image 123
Jaime Avatar answered Nov 10 '22 22:11

Jaime


I just came across this same problem. Jaime's solution breaks if any of the rows in the matrix are completely empty. Here's a workaround:

def sparse_max_row(csr_mat):
    ret = np.zeros(csr_mat.shape[0])
    ret[np.diff(csr_mat.indptr) != 0] = np.maximum.reduceat(csr_mat.data,csr_mat.indptr[:-1][np.diff(csr_mat.indptr)>0])
    return ret
like image 29
JakeM Avatar answered Nov 10 '22 22:11

JakeM