Is it possible for a regex to match based on other parts of the same regex?
For example, how would I match lines that begins and end with the same sequence of 3 characters, regardless of what the characters are?
Matches:
abcabc xyz abc xyz
Doesn't Match:
abc123
Undefined: (Can match or not, whichever is easiest)
ababa a
Ideally, I'd like something in the perl regex flavor. If that's not possible, I'd be interested to know if there are any flavors that can do it.
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.
3.6. (? i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.
The next token is the dot, which matches any character except newlines. The dot is repeated by the plus. The plus is greedy. Therefore, the engine will repeat the dot as many times as it can. The dot matches E, so the regex continues to try to match the dot with the next character.
Use capture groups and backreferences.
/^(.{3}).*\1$/
The \1
refers back to whatever is matched by the contents of the first capture group (the contents of the ()
). Regexes in most languages allow something like this.
You need backreferences. The idea is to use a capturing group for the first bit, and then refer back to it when you're trying to match the last bit. Here's an example of matching a pair of HTML start and end tags (from the link given earlier):
<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\1>
This regex contains only one pair of parentheses, which capture the string matched by
[A-Z][A-Z0-9]*
into the first backreference. This backreference is reused with\1
(backslash one). The/
before it is simply the forward slash in the closing HTML tag that we are trying to match.
Applying this to your case:
/^(.{3}).*\1$/
(Yes, that's the regex that Brian Carper posted. There just aren't that many ways to do this.)
A detailed explanation for posterity's sake (please don't be insulted if it's beneath you):
^
matches the start of the line. (.{3})
grabs three characters of any type and saves them in a group for later reference..*
matches anything for as long as possible. (You don't care what's in the middle of the line.)\1
matches the group that was captured in step 2.$
matches the end of the line.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With