Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python find duplicates which occur more than 3 times

Tags:

python

list

I am trying to find an efficient way to search three or more consecutive duplicates and replace them for only one in a Python list.

list_before = [1, 1, 1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8]

# expected
list_after = [1, 2, 3, 4, 5, 6, 6, 7, 8]

def replace(list_to_replace):
    for idx, val in enumerate(list_to_replace):
        if idx + 3 < len(list_to_replace):
            if val == list_to_replace[idx+1] == list_to_replace[idx+2]:
                del list_to_replace[idx+1]
                del list_to_replace[idx+2]
    return list_to_replace

>>> replace(list_before)
[1, 1, 3, 4, 5, 5, 6, 7, 7, 8, 8, 8]

What seems to be the problem here? Is there a more efficient way?

like image 438
J. Doe Avatar asked Jan 24 '19 14:01

J. Doe


3 Answers

I good use case for itertools.groupby:

>>> from itertools import groupby
>>> list_before = [1, 1, 1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8]
>>> list_after = []
>>> for k, group in groupby(list_before):
...     lst = list(group)
...     if len(lst) >= 3:
...         list_after.append(k)
...     else:
...         list_after.extend(lst)
>>> list_after
[1, 2, 3, 4, 5, 6, 6, 7, 8]

It would be possible make a one-liner with itertools.chain but the for loop is almost certainly more readable and similarly performant.

like image 155
Chris_Rands Avatar answered Oct 01 '22 13:10

Chris_Rands


>>> from itertools import groupby
>>> nums = [1, 1, 1, 2, 3, 4, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 8, 8]
>>> [k for k, g in groupby(nums) for i in range(1 + (len(list(g)) == 2))] 
[1, 2, 3, 4, 5, 6, 6, 7, 8]
like image 23
jamylak Avatar answered Sep 30 '22 13:09

jamylak


As pointed out by Chris in his answer, a one-liner is possible but it's not pretty at all.

In [88]: list(chain.from_iterable([(x,) if len(y) >= 3 else y for x, y in [(k, tuple(g)) for k, g in groupby(list_before)]]))
Out[88]: [1, 2, 3, 4, 5, 6, 6, 7, 8]

I think there should be a better way but chain is hacky enough to deal with when dealing with non-iterables.

like image 23
NullDev Avatar answered Oct 03 '22 13:10

NullDev