I am wondering if there is a way to check a subpattern match for a given sequence so I can block it.
For example, lets say that I wanted to capture everything except a repeat of an earlier capture. So if I had the sentence [word plus word] the following should capture everything (word plus) up to the second occurrence of word.
(\w+)[^\1]+
The first (\w+) captures word. The second [^...] capture group tries to exclude it (it being the \1 captured earlier), but it only works on characters - not subpattern captures.
So is there anyway to do this?
You can use patterns like this:
(\w+)(?:(?!\1).)*
Which uses a negative lookahead to assert (at every character) that the previously matched word is not contained in the subexpression.
You could use lazy quantifiers and lookaround, like this:
(\w+).*?(?=\1)
you may also want to surround w+ with word boundaries like this:
\b(\w+)\b.*?(?=\1)
so that you don't match things like this: hello where you would match the "ll"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With