efficient way to get the max of each row for large sparse matrix

Question

I have a large sparse matrix and I want to get the maximum value for each row. In numpy, I can call numpy.max(mat, axis=1), but I can not find similar function for scipy sparse matrix. Is there any efficient way to get the max of each row for a large sparse matrix?

Jaime · Accepted Answer

If your matrix, lets call it a, is stored in CSR format, then a.data has all the non-zero entries ordered by rows, and a.indptr has the index of the first element of every row. You can use this to calculate what you are after as follows:

def sparse_max_row(csr_mat):
    ret = np.maximum.reduceat(csr_mat.data, csr_mat.indptr[:-1])
    ret[np.diff(csr_mat.indptr) == 0] = 0
    return ret

JakeM · Answer

I just came across this same problem. Jaime's solution breaks if any of the rows in the matrix are completely empty. Here's a workaround:

def sparse_max_row(csr_mat):
    ret = np.zeros(csr_mat.shape[0])
    ret[np.diff(csr_mat.indptr) != 0] = np.maximum.reduceat(csr_mat.data,csr_mat.indptr[:-1][np.diff(csr_mat.indptr)>0])
    return ret

efficient way to get the max of each row for large sparse matrix

Tags:

python

scipy

sparse-matrix

hanqiang

2 Answers

Jaime

JakeM

Recent Activity

Donate For Us

efficient way to get the max of each row for large sparse matrix

Tags:

python

scipy

sparse-matrix

hanqiang

2 Answers

Jaime

JakeM

Related questions

Recent Activity

Donate For Us