Given a binary sequence, say: 000110111100011, I wish to mark all sections of contiguous 1s.
000110111100011
0123456789abcdef
I'll use hex notation, as 16 is a good length for an example sequence.
This should return (3 5) (6 a) (d f)
The first region occupies positions 3 and 4, therefore (using Python's convention), (3,5) denotes the region.
However occasionally a bit is corrupted, so I wish to add a tolerance so that it ignores up to k consecutive 0s.
So, setting min_gap=2 in the above example, I would like to get: (3 a) (d f).
i.e. the 5) (6 is removed, as that gap comprises just one zero, and is 'patched over'.
Seems simple, but I struggled to fully vectorize a solution in numpy.
def get_segments(segment_of_sums, mingap=1):
nonempty_padded = np.concatenate(( [0], segment_of_sums, [0] ))
b = nonempty_padded > 0
edges = b[:-1] ^ b[1:]
indices = np.argwhere(edges)[:,0]
# indices[0] will always be the first on-ramp, and indices[-1] will be the last off-ramp
# if segment ends with a 1, indices[-1] will be len(segment)
# len(indices) will ALWAYS be an even number
if mingap>1 and len(indices) > 2:
gap_lengths = indices[2::2] - indices[1::2][:-1]
gap_keeps = gap_lengths >= mingap
index_keeps = np.zeros_like(indices, dtype=np.bool8)
index_keeps[[0, -1]] = True
for i, flag in enumerate(gap_keeps):
if flag:
index_keeps[[2*i+1, 2*i+2]] = True
indices = indices[np.argwhere(index_keeps)[:,0]]
return indices.reshape((-1,2))
TEST = True
if TEST:
for L in [
[0,1,0, 1,1,1, 0,0,1, 1,1,0],
[0,1,0, 1,1,1, 0,0,1, 1,1,1],
[1,1,0, 1,1,1, 0,0,1, 1,1,1]
]:
A = np.array(L, dtype=np.bool8)
print(
get_segments(A, mingap=2)
)
This code correctly prints:
[[ 1 6]
[ 8 11]]
[[ 1 6]
[ 8 12]]
[[ 0 6]
[ 8 12]]
However the for loop feels clumsy.
Can anyone see a cleaner technique?
Not sure you can easily vectorize this, but here is a solution using itertools.groupby that should be quite fast:
s = '000110111100011'
from itertools import groupby
[('%x' % (x:=list(g))[0][0], '%x' % (x[-1][0]+1))
for k,g in groupby(enumerate(s), lambda x: x[1])
if k == '1']
Output:
[('3', '5'), ('6', 'a'), ('d', 'f')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With