My question is similar to this, but instead of removing full duplicates I'd like to remove consecutive partial "duplicates" from a list in python.
For my particular use case, I want to remove words from a list that start consecutive with the same character, and I want to be able to define that character. For this example it's #
, so
['#python', 'is', '#great', 'for', 'handling',
'text', '#python', '#text', '#nonsense', '#morenonsense', '.']
should become
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
You could use itertools.groupby
:
>>> from itertools import groupby
>>> lst = ['#python', 'is', '#great', 'for', 'handling', 'text', '#python', '#text', '#nonsense', '#morenonsense', '.']
>>> [s for k, g in ((k, list(g)) for k, g in groupby(lst, key=lambda s: s.startswith("#")))
... if not k or len(g) == 1 for s in g]
...
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
This groups elements by whether they start with a #
, then uses only those elements that do not or where the group only has a single element.
Here's one solution using itertools.groupby
. The idea is to group items depending on whether the first character is equal to a given k
. Then apply your 2 criteria; if they are not satisfied, you can yield the items.
L = ['#python', 'is', '#great', 'for', 'handling', 'text',
'#python', '#text', '#nonsense', '#morenonsense', '.']
from itertools import chain, groupby
def list_filter(L, k):
grouper = groupby(L, key=lambda x: x[0]==k)
for i, j in grouper:
items = list(j)
if not (i and len(items) > 1):
yield from items
res = list_filter(L, '#')
print(list(res))
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With