Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex 'or' operator avoid repetition

Tags:

c#

.net

regex

How can I use the or operator while not allowing repetition? In other words the regex:

(word1|word2|word3)+

will match word1word2 but will also match word1word1 which I don't want that because the word word1 is being repeated. How can I avoid repetition?

In summary I will like the following subjects to match:

word1word2word3
word1
word2
word3word2

Note all of them match cause there is no repetition. And I will like the following subjects to fail:

word1word2word1
word2word2
word3word1word2word2

Edit

Thanks to @Mark I know have:

(?xi)

(?:  
        (?<A>word1|word2)(?!  .*  \k<A> )      # match for word1 or word2 but make sure that if you capture it it does not follow what it was just captured
    |   (?<B>word3|word4)(?!  .*  \k<B> )
)+

because I am interested in seeing if something was captured in group A or B.

like image 668
Tono Nam Avatar asked Feb 06 '13 23:02

Tono Nam


2 Answers

You could use negative lookaheads:

^(?:word1(?!.*word1)|word2(?!.*word2)|word3(?!.*word3))+$

See it working online: rubular

like image 188
Mark Byers Avatar answered Nov 15 '22 18:11

Mark Byers


The lookahead solutions will not work in several cases, you can solve this properly, without lookarounds, by using a construct like this:

(?:(?(1)(?!))(word1)|(?(2)(?!))(word2)|(?(3)(?!))(word3))+

This works even if some words are substrings of others and will also work if you just want to find the matching substrings of a larger string (and not only match whole string).

Live demo.

It simply works by failing the alteration if it has been matched previously, done by (?(1)(?!)). (?(1)foo) is a conditional, and will match foo if group 1 has previously matched. (?!) always fails.

like image 42
Qtax Avatar answered Nov 15 '22 17:11

Qtax