I have a string as follows where I need to remove similar consecutive words.
mystring = "my friend's new new new new and old old cats are running running in the street"
My output should look as follows.
myoutput = "my friend's new and old cats are running in the street"
I am using the following python code to do it.
 mylist = []
 for i, w in enumerate(mystring.split()):
     for n, l in enumerate(mystring.split()):
             if l != w and i == n-1:
                     mylist.append(w)
 mylist.append(mystring.split()[-1])
 myoutput = " ".join(mylist)
However, my code is O(n²) and really inefficient as I have a huge dataset. I am wondering if there is a more efficient way of doing this in Python.
I am happy to provide more details if needed.
You can remove duplicates from a Python using the dict. fromkeys(), which generates a dictionary that removes any duplicate values. You can also convert a list to a set. You must convert the dictionary or set back into a list to see a list whose duplicates have been removed.
Short regex magic:
import re
mystring = "my friend's new new new new and old old cats are running running in the street"
res = re.sub(r'\b(\w+\s*)\1{1,}', '\\1', mystring)
print(res)
regex pattern details:
\b - word boundary(\w+\s*) - one or more word chars \w+ followed by any number of whitespace characters \s* - enclosed into a captured group (...)
\1{1,} - refers to the 1st captured group occurred one or more times {1,}
The output:
my friend's new and old cats are running in the street
                        Using itertools.groupby:
import itertools
>> ' '.join(k for k, _ in itertools.groupby(mystring.split()))
"my friend's new and old cats are running in the street"
mystring.split() splits the mystring.itertools.groupby efficiently groups the consecutive words by k.The complexity is linear in the size of the input string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With