Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing elements that have consecutive partial duplicates in Python

My question is similar to this, but instead of removing full duplicates I'd like to remove consecutive partial "duplicates" from a list in python.

For my particular use case, I want to remove words from a list that start consecutive with the same character, and I want to be able to define that character. For this example it's #, so

['#python', 'is', '#great', 'for', 'handling', 
'text', '#python', '#text', '#nonsense', '#morenonsense', '.']

should become

['#python', 'is', '#great', 'for', 'handling', 'text', '.']
like image 238
Moritz Avatar asked Jul 17 '18 11:07

Moritz


2 Answers

You could use itertools.groupby:

>>> from itertools import groupby
>>> lst = ['#python', 'is', '#great', 'for', 'handling', 'text', '#python', '#text', '#nonsense', '#morenonsense', '.']    
>>> [s for k, g in ((k, list(g)) for k, g in groupby(lst, key=lambda s: s.startswith("#")))
...    if not k or len(g) == 1 for s in g]
...
['#python', 'is', '#great', 'for', 'handling', 'text', '.']

This groups elements by whether they start with a #, then uses only those elements that do not or where the group only has a single element.

like image 149
tobias_k Avatar answered Oct 05 '22 23:10

tobias_k


Here's one solution using itertools.groupby. The idea is to group items depending on whether the first character is equal to a given k. Then apply your 2 criteria; if they are not satisfied, you can yield the items.

L = ['#python', 'is', '#great', 'for', 'handling', 'text',
     '#python', '#text', '#nonsense', '#morenonsense', '.']

from itertools import chain, groupby

def list_filter(L, k):
    grouper = groupby(L, key=lambda x: x[0]==k)
    for i, j in grouper:
        items = list(j)
        if not (i and len(items) > 1):
            yield from items

res = list_filter(L, '#')

print(list(res))

['#python', 'is', '#great', 'for', 'handling', 'text', '.']
like image 26
jpp Avatar answered Oct 06 '22 01:10

jpp