Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove string element in a list of strings if the first characters match with another string element in the list

Tags:

python

list

I want to lookup and compare efficiently the string elements in a list and then remove those which are parts of other string elements in the list (with the same beginning point)

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

I intend to get a list which looks like this:

list2 = [  'green apples are worse' , ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

In other words, I want to keep the longest string element from those elements which start with the same first characters.

like image 452
No Holidays Avatar asked Jun 24 '19 12:06

No Holidays


People also ask

How do I remove the first occurence of an element in a list?

The remove() method will remove the first instance of a value in a list. The pop() method removes an element at a given index, and will also return the removed item. You can also use the del keyword in Python to remove an element or slice from a list.

How do you remove an item from a list and add it to another list in Python?

append (item): This method is used to add new element at the end of the list. extend (anotherList): The items of one list can be inserted at the end of another list by using this method. remove (item): This method is used to remove particular item from the list.

Does the remove method remove all occurrences of an item from a list?

The remove() Method Removes the First Occurrence of an Item in a List. A thing to keep in mind when using the remove() method is that it will search for and will remove only the first instance of an item.

What method is used to remove one list element from another list?

Java List remove() method is used to remove elements from the list.


2 Answers

As suggested by John Coleman in comments, you can first sort the sentences and then compare consecutive sentences. If one sentences is a prefix of another, it will appear right before that sentences in the sorted list, so we just have to compare consecutive sentences. To preserve the original order, you can use a set for quickly looking up the filtered elements.

list1 = ['a boy ran', 'green apples are worse', 
         'a boy ran towards the mill', ' this is another sentence ',
         'a boy ran towards the mill and fell']                                                                

srtd = sorted(list1)
filtered = set(list1)
for a, b in zip(srtd, srtd[1:]):
    if b.startswith(a):
        filtered.remove(a)

list2 = [x for x in list1 if x in filtered]                                     

Afterwards, list2 is the following:

['green apples are worse',
 ' this is another sentence ',
 'a boy ran towards the mill and fell']

With O(nlogn) this is considerably faster than comparing all pairs of sentences in O(n²), but if the list is not too long, the much simpler solution by Vicrobot will work just as well.

like image 128
tobias_k Avatar answered Sep 30 '22 19:09

tobias_k


This is a way you can achieve that:-

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell']
list2 = []
for i in list1:
    bool = True
    for j in list1:
        if id(i) != id(j) and j.startswith(i): bool = False
    if bool: list2.append(i)
>>> list2
['green apples are worse', ' this is another sentence ', 'a boy ran towards the mill and fell']
like image 25
Vicrobot Avatar answered Sep 30 '22 19:09

Vicrobot