Find indexes of repeated elements in an array (Python, NumPy)

Question

Assume, I have a NumPy-array of integers, as:

[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]

I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6. Is there a special tool in Python that is helpful? Otherwise, I would loop through the array index for index and compare the actual value with the previous.

Regards.

EFT · Accepted Answer

Using np.diff and the method given here by @WarrenWeckesser for finding runs of zeros in an array:

import numpy as np

def zero_runs(a):  # from link
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]

zero_runs(np.diff(a))
Out[87]: 
array([[ 3,  8],
       [15, 22]], dtype=int32)

This can then be filtered on the difference between the start & end of the run:

runs = zero_runs(np.diff(a))

runs[runs[:, 1]-runs[:, 0]>5]  # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)

pylang · Answer

Here is a solution using Python's native itertools.

Code

import itertools as it


def find_ranges(lst, n=2):
    """Return ranges for `n` or more repeated values."""
    groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
    repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
    return ((sub[0][0], sub[-1][0]) for sub in repeated)

lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]    
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]

Tests

import nose.tools as nt


def test_ranges(f):
    """Verify list results identifying ranges."""
    nt.eq_(list(f([])), [])
    nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
    nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
    nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])    
    nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])

test_ranges(find_ranges)

This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.

See also this post for finding ranges of indices using itertools.groupby.

Find indexes of repeated elements in an array (Python, NumPy)

Tags:

python

arrays

numpy

mcatis

2 Answers

EFT

pylang

Recent Activity

Donate For Us

Find indexes of repeated elements in an array (Python, NumPy)

Tags:

python

arrays

numpy

mcatis

2 Answers

EFT

pylang

Related questions

Recent Activity

Donate For Us