Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding palindrome using regex

This question comes in an attempt to understand one of the answer in : How to check that a string is a palindrome using regular expressions?

Answer given by Markus Jarderot is :

/^((.)(?1)\2|.?)$/

Can someone please explain, whats exactly happening here....i need to do similar in Perl, but not able to understand this solution!!!

PS : I am not very good in perl so please go easy ....and also "this can't be considered a regular expression if you want to be strict" - i read this line, so i am aware that this not regex strictly

like image 996
NoobEditor Avatar asked Mar 12 '14 10:03

NoobEditor


People also ask

Can regular expressions be used to match for palindromes of arbitrary length?

No. As others have mentioned, regular expressions cannot match palindromes of arbitrary length.

Why palindrome is not a regular expression?

Are palindromes regular? No, the language of all (and only) palindromic strings on some alphabet (of at least 2 symbols) is not a regular language.

Are palindromes regular?

By the pumping lemma, xz must be in the language. But xz can't be a palindrome. This means that the set of palindromes doesn't satisfy the pumping lemma and, thus, the set of palindromes cannot be regular.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .


2 Answers

  • ^ - matches beginning of string
  • ( - starts capture group #1
  • (.) - matches any single character except a newline, save it in capture group #2
  • (?1) - recurse = replace this group with the entire regexp capture group #1
  • \2 - matches the same thing as capture group #2. This requires the first and last characters of the string to match each other
  • | - creates an alternative
  • .? - optionally matches any one character that isn't a newline - This handles the end of the recursion, by matching an empty string (when the whole string is an even length) or a single character (when it's an odd length)
  • ) - ends capture group #1
  • $ - matches end of string or before a newline at the end of the string.

The recursion (?1) is the key. A palindrome is an empty string, a 1-character string, or a string whose first and last characters are the same and the substring between them is also a palindrome.

like image 136
Barmar Avatar answered Oct 23 '22 15:10

Barmar


It might be easier to understand with this analogous function, that does the same thing for arrays:

sub palindrome {
  if (scalar(@_) >= 2) {
    my $first_dot = shift;
    my $slash_two = pop;
    return $first_dot eq $slash_two && palindrome(@_);
  } else {
    # zero or one items
    return 1;
  }
}

print "yes!\n" if palindrome(qw(one two three two one));
print "really?\n" if palindrome(qw(one two three two two one));

The (?1) notation is a recursive reference to the start of the first parenthesis in the regex, the \2 is a backreference in the current recursion to the (.). Those two are anchored at the start and end of 'whatever is matching at the current recursion depth', so everything else is matched at the next depth down.


ikegami suspects this is faster:

sub palindrome {
   my $next = 0;
   my %symbols;
   my $s = join '', map chr( $symbols{$_} ||= $next++ ), @_;
   return $s =~ /^((.)(?1)\2|.?)\z/s;
}
like image 31
bazzargh Avatar answered Oct 23 '22 15:10

bazzargh