Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find Start / Stop Index Range For Values in NumPy Array Greater Than N

Tags:

python

numpy

Let's say I have a NumPy array:

x = np.array([2, 3, 4, 0, 0, 1, 1, 4, 6, 5, 8, 9, 9, 4, 2, 0, 3])

For all values in x >= 2, I need to find the start / stop indices where consecutive values of x >=2 (i.e., a run of one single value greater than or equal to 2 is not counted). Then, I repeat this for x >= 3, x >=4, ..., x >= x.max(). The output should be a NumPy array three columns (first column is the minimum value, the second column is the inclusive start index, and the third column is the stop index) and will look like:

[[2,  0,  2],
 [2,  7, 14],
 [3,  1,  2],
 [3,  7, 13],
 [4,  7, 13],
 [5,  8, 12],
 [6, 10, 12],
 [8, 10, 12],
 [9, 11, 12]
]

Naively, I could look through each unique value and then search for the start/stop indices. However, this requires doing multiple passes over x. What's the best NumPy vectorized way to accomplish this task? Is there a solution that doesn't require multiple passes over the data?

Update

I realized that I also need to count the single instances as well. So, my output should be:

[[2,  0,  2],
 [2,  7, 14],
 [2, 16, 16],  # New line needed
 [3,  1,  2],
 [3,  7, 13],
 [3, 16, 16],  # New line needed
 [4,  2,  2],  # New line needed
 [4,  7, 13],
 [5,  8, 12],
 [6,  8,  8],  # New line needed
 [6, 10, 12],
 [8, 10, 12],
 [9, 11, 12]
]
like image 701
slaw Avatar asked Oct 16 '22 05:10

slaw


1 Answers

Here's another solution (which I believe can be improved):

import numpy as np
from numpy.lib.stride_tricks import as_strided

x = np.array([2, 3, 4, 0, 0, 1, 1, 4, 6, 5, 8, 9, 9, 4, 2, 0, 3])

# array of unique values of x bigger than 1
a = np.unique(x[x>=2])

step = len(a)  # if you encounter memory problems, try a smaller step
result = []
for i in range(0, len(a), step):
    ai = a[i:i + step]
    c = np.argwhere(x >= ai[:, None])
    c[:,0] = ai[c[:,0]]
    c =  np.pad(c, ((1,1), (0,0)), 'symmetric')

    d = np.where(np.diff(c[:,1]) !=1)[0]

    e = as_strided(d, shape=(len(d)-1, 2), strides=d.strides*2).copy()
    # e = e[(np.diff(e, axis=1) > 1).flatten()]
    e[:,0] = e[:,0] + 1 

    result.append(np.hstack([c[:,0][e[:,0, None]], c[:,1][e]]))

result = np.concatenate(result)

# array([[ 2,  0,  2],
#        [ 2,  7, 14],
#        [ 2, 16, 16],
#        [ 3,  1,  2],
#        [ 3,  7, 13],
#        [ 3, 16, 16],
#        [ 4,  2,  2],
#        [ 4,  7, 13],
#        [ 5,  8, 12],
#        [ 6,  8,  8],
#        [ 6, 10, 12],
#        [ 8, 10, 12],
#        [ 9, 11, 12]])

Sorry for not commenting what each step does -- if later I will find time I will fix it.

like image 119
Andreas K. Avatar answered Oct 19 '22 00:10

Andreas K.