This question comes in an attempt to understand one of the answer in : How to check that a string is a palindrome using regular expressions?
Answer given by Markus Jarderot is :
/^((.)(?1)\2|.?)$/
Can someone please explain, whats exactly happening here....i need to do similar in Perl
, but not able to understand this solution!!!
PS : I am not very good in perl so please go easy ....and also "this can't be considered a regular expression if you want to be strict" - i read this line, so i am aware that this not regex strictly
No. As others have mentioned, regular expressions cannot match palindromes of arbitrary length.
Are palindromes regular? No, the language of all (and only) palindromic strings on some alphabet (of at least 2 symbols) is not a regular language.
By the pumping lemma, xz must be in the language. But xz can't be a palindrome. This means that the set of palindromes doesn't satisfy the pumping lemma and, thus, the set of palindromes cannot be regular.
Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .
^
- matches beginning of string(
- starts capture group #1(.)
- matches any single character except a newline, save it in capture group #2(?1)
- recurse = replace this group with the entire regexp capture group #1\2
- matches the same thing as capture group #2. This requires the first and last characters of the string to match each other|
- creates an alternative.?
- optionally matches any one character that isn't a newline - This handles the end of the recursion, by matching an empty string (when the whole string is an even length) or a single character (when it's an odd length))
- ends capture group #1$
- matches end of string or before a newline at the end of the string.The recursion (?1)
is the key. A palindrome is an empty string, a 1-character string, or a string whose first and last characters are the same and the substring between them is also a palindrome.
It might be easier to understand with this analogous function, that does the same thing for arrays:
sub palindrome {
if (scalar(@_) >= 2) {
my $first_dot = shift;
my $slash_two = pop;
return $first_dot eq $slash_two && palindrome(@_);
} else {
# zero or one items
return 1;
}
}
print "yes!\n" if palindrome(qw(one two three two one));
print "really?\n" if palindrome(qw(one two three two two one));
The (?1)
notation is a recursive reference to the start of the first parenthesis in the regex, the \2
is a backreference in the current recursion to the (.)
. Those two are anchored at the start and end of 'whatever is matching at the current recursion depth', so everything else is matched at the next depth down.
ikegami suspects this is faster:
sub palindrome {
my $next = 0;
my %symbols;
my $s = join '', map chr( $symbols{$_} ||= $next++ ), @_;
return $s =~ /^((.)(?1)\2|.?)\z/s;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With