Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast vectorized way to convert row vector to inptrs for sparse matrix?

For sparse matrices, we usually pass in column indices (indices) and an indptr vector that indexes the indices vector so that indices[indptr[i]:indptr[i+1]] are the elements of row i in the sparse matrix.

Is there a fast, vectorized, preferably numpy solution to convert a vector of consecutive row indices into an indptr in Python?

For example, if this is my rows indices vector: [0,1,1,2,2,2,3,5]...

The indptr vector would be [0,1,3,6,7,7,8] where the 7 repeats because the row vector is missing row 4.

I can do it using a simple loop:

for i in range(len(rows)):
    indptr[rows[i]+1] += 1
    indptr=np.cumsum(indptr)

But I was wondering if there's a faster, vectorized way to do it?

like image 618
narcissa Avatar asked Mar 02 '23 01:03

narcissa


2 Answers

I think what you are looking for is this:

np.bincount(rows).cumsum()
#[1 3 6 7 7 8]

And if there are rows at the bottom of your matrix that might be empty, simply add that as argument to bincount (per @CJR's recommendation):

np.bincount(rows, minlength=num_rows).cumsum()
#[1 3 6 7 7 8]

You probably want to insert a 0 in the front as well. What bincount does is counting the number of elements in each bin/row and then cumsum adds them up. This way you will include missing bins/rows as well.

The best way to insert a 0 is probably by this:

np.bincount(np.array(rows)+1).cumsum()
#[0 1 3 6 7 7 8]

or you can directly do it by:

np.insert(np.bincount(rows).cumsum(),0,0)
#[0 1 3 6 7 7 8]
like image 192
Ehsan Avatar answered May 03 '23 18:05

Ehsan


Another idea would be

n = len(rows)
indptr = np.searchsorted(rows, np.arange(-1,n), side='right')

Not sure which is faster/ better

like image 42
Miguel Avatar answered May 03 '23 18:05

Miguel