So starting with a list of strings, as below
string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
I want to remove any element from the list that is a substring of another element, giving the result for instance...
string_list = ['resting', 'looked', 'spit']
I have some code that acheives this but it's embarrassingly ugly and probably needlessly complex. Is there a simple way to do this in Python?
The remove() method removes the first matching element (which is passed as an argument) from the list. The pop() method removes an element at a given index, and will also return the removed item. You can also use the del keyword in Python to remove an element or slice from a list.
First building block: substring.
You can use in
to check:
>>> 'rest' in 'resting'
True
>>> 'sing' in 'resting'
False
Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.
def substringSieve(string_list):
out = []
for s in string_list:
if not any([s in r for r in string_list if s != r]):
out.append(s)
return out
You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):
def substringSieve(string_list):
string_list.sort(key=lambda s: len(s), reverse=True)
out = []
for s in string_list:
if not any([s in o for o in out]):
out.append(s)
return out
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With