In Perl regexes, expressions like \1
, \2
, etc. are usually interpreted as "backreferences" to previously captured groups, but not so when the \1
, \2
, etc. appear within a character class. In the latter case, the \
is treated as an escape character (and therefore \1
is just 1
, etc.).
Therefore, if (for example) one wanted to match a string (of length greater than 1) whose first character matches its last character, but does not appear anywhere else in the string, the following regex will not do:
/\A # match beginning of string;
(.) # match and capture first character (referred to subsequently by \1);
[^\1]* # (WRONG) match zero or more characters different from character in \1;
\1 # match \1;
\z # match the end of the string;
/sx # s: let . match newline; x: ignore whitespace, allow comments
would not work, since it matches (for example) the string 'a1a2a'
:
DB<1> ( 'a1a2a' =~ /\A(.)[^\1]*\1\z/ and print "fail!" ) or print "success!"
fail!
I can usually manage to find some workaround1, but it's always rather problem-specific, and usually far more complicated-looking than what I would do if I could use backreferences within a character class.
Is there a general (and hopefully straightforward) workaround?
1 For example, for the problem in the example above, I'd use something like
/\A
(.) # match and capture first character (referred to subsequently
# by \1);
(?!.*\1\.+\z) # a negative lookahead assertion for "a suffix containing \1";
.* # substring not containing \1 (as guaranteed by the preceding
# negative lookahead assertion);
\1\z # match last character only if it is equal to the first one
/sx
...where I've replaced the reasonably straightforward (though, alas, incorrect) subexpression [^\1]*
in the earlier regex with the somewhat more forbidding negative lookahead assertion (?!.*\1.+\z)
. This assertion basically says "give up if \1
appears anywhere beyond this point (other than at the last position)." Incidentally, I give this solution just to illustrate the sort of workarounds I referred to in the question. I don't claim that it is a particularly good one.
This can be accomplished with a negative lookahead within a repeated group:
/\A # match beginning of string;
(.) # match and capture first character (referred to subsequently by \1);
((?!\1).)* # match zero or more characters different from character in \1;
\1 # match \1;
\z # match the end of the string;
/sx
This pattern can be used even if the group contains more than one character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With