For sparse matrices, we usually pass in column indices (indices
) and an indptr
vector that indexes the indices
vector so that indices[indptr[i]:indptr[i+1]]
are the elements of row i
in the sparse matrix.
Is there a fast, vectorized, preferably numpy solution to convert a vector of consecutive
row indices into an indptr
in Python?
For example, if this is my rows
indices vector: [0,1,1,2,2,2,3,5]
...
The indptr
vector would be [0,1,3,6,7,7,8]
where the 7 repeats because the row vector is missing row 4.
I can do it using a simple loop:
for i in range(len(rows)):
indptr[rows[i]+1] += 1
indptr=np.cumsum(indptr)
But I was wondering if there's a faster, vectorized way to do it?
I think what you are looking for is this:
np.bincount(rows).cumsum()
#[1 3 6 7 7 8]
And if there are rows at the bottom of your matrix that might be empty, simply add that as argument to bincount
(per @CJR's recommendation):
np.bincount(rows, minlength=num_rows).cumsum()
#[1 3 6 7 7 8]
You probably want to insert a 0
in the front as well. What bincount
does is counting the number of elements in each bin/row and then cumsum
adds them up. This way you will include missing bins/rows as well.
The best way to insert a 0 is probably by this:
np.bincount(np.array(rows)+1).cumsum()
#[0 1 3 6 7 7 8]
or you can directly do it by:
np.insert(np.bincount(rows).cumsum(),0,0)
#[0 1 3 6 7 7 8]
Another idea would be
n = len(rows)
indptr = np.searchsorted(rows, np.arange(-1,n), side='right')
Not sure which is faster/ better
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With