Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to match Regex based on previous capture group, not captured previously?

Tags:

regex

Okay, so the task is that there is a string that can either look like post, or post put or even get put post. All of these must be matched. Preferably deviances like [space]post, or get[space] should not be matched.

Currently I came up with this

^(post|put|delete|get)(( )(post|put|delete|get))*$

However I'm not satisfied with it, because I had to specify (post|put|delete|get) twice. It also matches duplications like post post.

I'd like to somehow use a backreference(?) to the first group so that I don't have to specify the same condition twice.

However, backreference \1 would help me only match post post, for example, and that's the opposite of what I want. I'd like to match a word in the first capture group that was NOT previously found in the string.

Is this even possible? I've been looking through SO questions, but my Google-fu is eluding me.

like image 807
EpicPandaForce Avatar asked Dec 01 '25 04:12

EpicPandaForce


1 Answers

If you are using a PCRE-based regex engine, you may use subroutine calls like (?n) to recurse the subpatterns.

^(post|put|delete|get)( (?!\1)(?1))*$
                              ^^^^

See the regex demo

Expression details:

  • ^ - start of string
  • (post|put|delete|get) - Group 1 matching one of the alternatives as literal substrings
  • ( (?!\1)(?1))* - zero or more sequences of:
    • - a space
    • (?!\1) - a negative lookahead that fails the match if the text after the current location is identical to the one captured into Group 1 due to backreference \1
    • (?1) - a subroutine call to the first capture group (i.e. it uses the same pattern used in Group 1)
  • $ - end of string

UPDATE

In order to avoid matching strings like get post post, you need to also add a negative lookahead into Group 1 so that the subroutine call was aware that we do not want to match the same value that was captured into Group 1.

^((post|put|delete|get)(?!.*\2))( (?1))*$

See the regex demo

The difference is that we capture the alternations into Group 2 and add the negative lookahead (?!.*\2) to disallow any occurrences of the word we captured further in the string. The ( (?1))* remains intact: now, the subroutine recurses the whole Capture Group 1 subpattern with the lookahead.

like image 130
Wiktor Stribiżew Avatar answered Dec 05 '25 06:12

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!