Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex for repeating words in a string in Python

Tags:

python

regex

I have a good regexp for replacing repeating characters in a string. But now I also need to replace repeating words, three or more word will be replaced by two words.

Like

bye! bye! bye!

should become

bye! bye!

My code so far:

def replaceThreeOrMoreCharachetrsWithTwoCharacters(string): 
     # pattern to look for three or more repetitions of any character, including newlines. 
     pattern = re.compile(r"(.)\1{2,}", re.DOTALL) 
     return pattern.sub(r"\1\1", string)
like image 579
Bjorn Avatar asked Aug 24 '14 17:08

Bjorn


3 Answers

You could try the below regex also,

(?<= |^)(\S+)(?: \1){2,}(?= |$)

Sample code,

>>> import regex
>>> s = "hi hi hi hi some words words words which'll repeat repeat repeat repeat repeat"
>>> m = regex.sub(r'(?<= |^)(\S+)(?: \1){2,}(?= |$)', r'\1 \1', s)
>>> m
"hi hi some words words which'll repeat repeat"

DEMO

like image 31
Avinash Raj Avatar answered Sep 24 '22 10:09

Avinash Raj


I know you were after a regular expression but you could use a simple loop to achieve the same thing:

def max_repeats(s, max=2):
  last = ''
  out = []
  for word in s.split():
    same = 0 if word != last else same + 1
    if same < max: out.append(word)
    last = word
  return ' '.join(out)

As a bonus, I have allowed a different maximum number of repeats to be specified (the default is 2). If there is more than one space between each word, it will be lost. It's up to you whether you consider that to be a bug or a feature :)

like image 43
Tom Fenech Avatar answered Sep 24 '22 10:09

Tom Fenech


Assuming that what is called "word" in your requirements is one or more non-whitespaces characters surrounded by whitespaces or string limits, you can try this pattern:

re.sub(r'(?<!\S)((\S+)(?:\s+\2))(?:\s+\2)+(?!\S)', r'\1', s)
like image 77
Casimir et Hippolyte Avatar answered Sep 22 '22 10:09

Casimir et Hippolyte