Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check that regex subpattern does not contain a previous subpattern?

I am wondering if there is a way to check a subpattern match for a given sequence so I can block it.

For example, lets say that I wanted to capture everything except a repeat of an earlier capture. So if I had the sentence [word plus word] the following should capture everything (word plus) up to the second occurrence of word.

(\w+)[^\1]+

The first (\w+) captures word. The second [^...] capture group tries to exclude it (it being the \1 captured earlier), but it only works on characters - not subpattern captures.

So is there anyway to do this?

like image 959
Xeoncross Avatar asked May 10 '26 08:05

Xeoncross


2 Answers

You can use patterns like this:

(\w+)(?:(?!\1).)*

Which uses a negative lookahead to assert (at every character) that the previously matched word is not contained in the subexpression.

like image 96
Qtax Avatar answered May 12 '26 01:05

Qtax


You could use lazy quantifiers and lookaround, like this:

(\w+).*?(?=\1)

you may also want to surround w+ with word boundaries like this:

\b(\w+)\b.*?(?=\1)

so that you don't match things like this: hello where you would match the "ll"

like image 28
Sophie Avatar answered May 12 '26 00:05

Sophie



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!