Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python regular expression to remove repeated words

Tags:

python

regex

I am very new a Python

I want to change sentence if there are repeated words.

Correct

  • Ex. "this just so so so nice" --> "this is just so nice"
  • Ex. "this is just is is" --> "this is just is"

Right now am I using this reg. but it do all so change on letters. Ex. "My friend and i is happy" --> "My friend and is happy" (it remove the "i" and space) ERROR

text = re.sub(r'(\w+)\1', r'\1', text) #remove duplicated words in row

How can I do the same change but instead of letters it have to check on words?

like image 671
boje Avatar asked Jun 21 '13 15:06

boje


People also ask

How do I remove repeating words in Python?

You can remove duplicates using a Python set or the dict. fromkeys() method. The dict. fromkeys() method converts a list into a dictionary.

What is ?: In regex?

It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.

Can you use replace in Python regex?

To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.


2 Answers

text = re.sub(r'\b(\w+)( \1\b)+', r'\1', text) #remove duplicated words in row

The \b matches the empty string, but only at the beginning or end of a word.

like image 146
tom Avatar answered Nov 10 '22 22:11

tom


Non- regex solution using itertools.groupby:

>>> strs = "this is just is is"
>>> from itertools import groupby
>>> " ".join([k for k,v in groupby(strs.split())])
'this is just is'
>>> strs = "this just so so so nice" 
>>> " ".join([k for k,v in groupby(strs.split())])
'this just so nice'
like image 21
Ashwini Chaudhary Avatar answered Nov 10 '22 22:11

Ashwini Chaudhary