Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get rid of duplicates in regex

Suppose I had a string, "cats cats cats and dogs dogs dogs."

What regular expression would I use in order to replace that string with,"cats and dogs." i.e. removing duplicates. The expression however must only remove duplicates that follow after each other. For instance:

"cats cats cats and dogs dogs dogs and cats cats and dogs dogs"

Would return:

"cats and dogs and cats and dogs"

like image 943
Immanu'el Smith Avatar asked Jun 10 '10 13:06

Immanu'el Smith


1 Answers

resultString = Regex.Replace(subjectString, @"\b(\w+)(?:\s+\1\b)+", "$1");

will do all replacements in one single call.

Explanation:

\b                 # assert that we are at a word boundary
                   # (we only want to match whole words)
(\w+)              # match one word, capture into backreference #1
(?:                # start of non-capturing, repeating group
   \s+             # match at least one space
   \1              # match the same word as previously captured
   \b              # as long as we match it completely
)+                 # do this at least once
like image 185
Tim Pietzcker Avatar answered Oct 01 '22 02:10

Tim Pietzcker