Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Remove any element from a list of strings that is a substring of another element

So starting with a list of strings, as below

string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']

I want to remove any element from the list that is a substring of another element, giving the result for instance...

string_list = ['resting', 'looked', 'spit']

I have some code that acheives this but it's embarrassingly ugly and probably needlessly complex. Is there a simple way to do this in Python?

like image 244
Matt Fisher Avatar asked Feb 12 '14 06:02

Matt Fisher


People also ask

How do you remove an element from a list to another list in Python?

The remove() method removes the first matching element (which is passed as an argument) from the list. The pop() method removes an element at a given index, and will also return the removed item. You can also use the del keyword in Python to remove an element or slice from a list.


1 Answers

First building block: substring.

You can use in to check:

>>> 'rest' in 'resting'
True
>>> 'sing' in 'resting'
False

Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.

def substringSieve(string_list):
    out = []
    for s in string_list:
        if not any([s in r for r in string_list if s != r]):
            out.append(s)
    return out

You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):

def substringSieve(string_list):
    string_list.sort(key=lambda s: len(s), reverse=True)
    out = []
    for s in string_list:
        if not any([s in o for o in out]):
            out.append(s)
    return out
like image 192
Liyan Chang Avatar answered Oct 14 '22 22:10

Liyan Chang