I have a list like this:
['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
I want to remove all of the strings that occur after a string that starts with the same 4 characters as it. For example, 'a b e' would be removed because 'a b d' occurs before it.
The new list should look like this:
['a b d', 'c d j', 'w x y']
How can I do this?
(NOTE: The list is sorted, as per @Martijn Pieters' comment)
Using a generator function to remember the starts:
def remove_starts(lst):
seen = []
for elem in lst:
if elem.startswith(tuple(seen)):
continue
yield elem
seen.append(elem[:4])
So the function skips anything that starts with one of the strings in seen, adding the first 4 characters of anything it does allow through to that set.
Demo:
>>> lst = ['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
>>> def remove_starts(lst):
... seen = []
... for elem in lst:
... if elem.startswith(tuple(seen)):
... continue
... yield elem
... seen.append(elem[:4])
...
>>> list(remove_starts(lst))
['a b d', 'c d j', 'w x y']
If your input is sorted, this can be simplified to:
def remove_starts(lst):
seen = ()
for elem in lst:
if elem.startswith(seen):
continue
yield elem
seen = elem[:4]
This saves on prefix-testing by limiting to just the last one.
You could also use an OrderedDict, the keys can be the first four chars where the values will be the first string that contains those four characters:
lst = ['a b d', 'a b e', 'c d j', 'w x y', 'w x z', 'w x k']
from collections import OrderedDict
print(list(OrderedDict((s[:4], s) for s in lst).values()))
['a b e', 'c d j', 'w x k']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With