Remove string element in a list of strings if the first characters match with another string element in the list

Tags:

list

I want to lookup and compare efficiently the string elements in a list and then remove those which are parts of other string elements in the list (with the same beginning point)

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

I intend to get a list which looks like this:

list2 = [  'green apples are worse' , ' this is another sentence ' , 'a boy ran towards the mill and fell',.....]

In other words, I want to keep the longest string element from those elements which start with the same first characters.

452

asked Jun 24 '19 12:06

2 Answers

As suggested by John Coleman in comments, you can first sort the sentences and then compare consecutive sentences. If one sentences is a prefix of another, it will appear right before that sentences in the sorted list, so we just have to compare consecutive sentences. To preserve the original order, you can use a set for quickly looking up the filtered elements.

list1 = ['a boy ran', 'green apples are worse', 
         'a boy ran towards the mill', ' this is another sentence ',
         'a boy ran towards the mill and fell']                                                                

srtd = sorted(list1)
filtered = set(list1)
for a, b in zip(srtd, srtd[1:]):
    if b.startswith(a):
        filtered.remove(a)

list2 = [x for x in list1 if x in filtered]

Afterwards, list2 is the following:

['green apples are worse',
 ' this is another sentence ',
 'a boy ran towards the mill and fell']

With O(nlogn) this is considerably faster than comparing all pairs of sentences in O(n²), but if the list is not too long, the much simpler solution by Vicrobot will work just as well.

128

answered Sep 30 '22 19:09

tobias_k

This is a way you can achieve that:-

list1 = [ 'a boy ran' , 'green apples are worse' , 'a boy ran towards the mill' ,  ' this is another sentence ' , 'a boy ran towards the mill and fell']
list2 = []
for i in list1:
    bool = True
    for j in list1:
        if id(i) != id(j) and j.startswith(i): bool = False
    if bool: list2.append(i)
>>> list2
['green apples are worse', ' this is another sentence ', 'a boy ran towards the mill and fell']

answered Sep 30 '22 19:09

Vicrobot

Related questions
                            
                                No legends Seaborn lineplot
                            
                                How to change result of type(object)?
                            
                                How to integrate Wikidata query in python
                            
                                Pandas rolling apply function to entire window dataframe
                            
                                Splitting on / inside a list in Python
                            
                                Add path to sys.path vs. PEP E402
                            
                                Pandas Merge and filter
                            
                                Question related to super() with __init__()
                            
                                Why do I not have to define the variable in a for loop using range(), but I do have to in a while loop in Python?
                            
                                How to crop multiple rectangles or squares from JPEG?
                            
                                How do I solve the leap year function in Python for Hackerrank?
                            
                                Read and dump [bracket, list] from and to yaml with python
                            
                                Is there a more pythonic way to write multiple comparisons
                            
                                PySpark explode stringified array of dictionaries into rows
                            
                                ModuleNotFoundError when using importlib.import_module
                            
                                Pandas Timestamp rounds 30 seconds inconsistently
                            
                                How to create a Pandas DataFrame from dictionary of dataframes?
                            
                                Perform operations after styling in a dataframe
                            
                                Missing values in Pandas Pivot table?
                            
                                Optimizing suggestions for a piece of Julia and Python code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove string element in a list of strings if the first characters match with another string element in the list

Tags:

python

list

No Holidays

People also ask

2 Answers

tobias_k

Vicrobot

Recent Activity

Donate For Us