Assume, I have a NumPy-array of integers, as:
[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6. Is there a special tool in Python that is helpful? Otherwise, I would loop through the array index for index and compare the actual value with the previous.
Regards.
Using np.diff
and the method given here by @WarrenWeckesser for finding runs of zeros in an array:
import numpy as np
def zero_runs(a): # from link
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
zero_runs(np.diff(a))
Out[87]:
array([[ 3, 8],
[15, 22]], dtype=int32)
This can then be filtered on the difference between the start & end of the run:
runs = zero_runs(np.diff(a))
runs[runs[:, 1]-runs[:, 0]>5] # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)
Here is a solution using Python's native itertools
.
Code
import itertools as it
def find_ranges(lst, n=2):
"""Return ranges for `n` or more repeated values."""
groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
return ((sub[0][0], sub[-1][0]) for sub in repeated)
lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]
Tests
import nose.tools as nt
def test_ranges(f):
"""Verify list results identifying ranges."""
nt.eq_(list(f([])), [])
nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])
nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])
test_ranges(find_ranges)
This example captures (index, element) pairs in lst
, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.
See also this post for finding ranges of indices using itertools.groupby
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With