This regular expression matches palindromes: <code>^((.)(?1)\2|.?)$</code> Can't wrap my head around how it works. When does the recursion end, and when regex breaks from the recursive subpattern and goes to <code>"|.?"</code> part? Thanks. edit: sorry I didn't explain <code>\2</code> and <code>(?1)</code> <code>(?1)</code> - refers to first subpattern (to itself) <code>\2</code> - back-reference to a match of second subpattern, which is <code>(.)</code> Above example written in PHP. Matches both "abba" (no mid palindrome character) and "abcba" - has a middle, non-reflected character

<code>^((.)(?1)\2|.?)$</code> The <code>^</code> and <code>$</code> asserts the beginning and the end of the string respectively. Let us look at the content in between, which is more interesting: <pre class="prettyprint"><code>((.)(?1)\2|.?) 1------------1 // Capturing group 1 2-2 // Capturing group 2 </code></pre> Look at the first part <code>(.)(?1)\2</code>, we can see that it will try to match any character, and that same character at the end (back reference <code>\2</code>, which refers to the character matched by <code>(.)</code>). In the middle, it will recursively match for the whole capturing group 1. Note that there is an implicit assertion (caused by <code>(.)</code> matching one character at the beginning and <code>\2</code> matching the same character at the end) that requires the string to be at least 2 characters. The purpose of the first part is chopping the identical ends of the string, recursively. Look at second part <code>.?</code>, we can see that it will match one or 0 character. This will only be matched if the string initially has length 0 or 1, or after the leftover from the recursive match is 0 or 1 character. The purpose of the second part is to match the empty string or the single lonely character after the string is chopped from both ends. The recursive matching works: <ul> <li>The whole string must be palindrome to pass, asserted by <code>^</code> and <code>$</code>. We cannot start matching from any random position.</li> <li>If the string is <= 1 character, it passes.</li> <li>If the string is > 2 characters, whether it is accepted is decided by the first part only. And it will be chopped by 2 ends if matches.</li> <li>The leftover if matches, can only be chopped by the 2 ends, or passes if its length is <= 1.</li> </ul>

The regex is essentially equivalent to the following pseudo-code: <pre class="prettyprint"><code>palin(str) { if (length(str) >= 2) { first = str[0]; last = str[length(str)-1]; return first == last && palin(substr(str, 1, length(str)-2)); } else // empty and single-char trivially palindromes return true; } </code></pre>

How does regular expression engine parse regex with recursive subpatterns?

Video Answer

2 Answers

^((.)(?1)\2|.?)$

The ^ and $ asserts the beginning and the end of the string respectively. Let us look at the content in between, which is more interesting:

((.)(?1)\2|.?)
1------------1 // Capturing group 1
 2-2           // Capturing group 2

Look at the first part (.)(?1)\2, we can see that it will try to match any character, and that same character at the end (back reference \2, which refers to the character matched by (.)). In the middle, it will recursively match for the whole capturing group 1. Note that there is an implicit assertion (caused by (.) matching one character at the beginning and \2 matching the same character at the end) that requires the string to be at least 2 characters. The purpose of the first part is chopping the identical ends of the string, recursively.

Look at second part .?, we can see that it will match one or 0 character. This will only be matched if the string initially has length 0 or 1, or after the leftover from the recursive match is 0 or 1 character. The purpose of the second part is to match the empty string or the single lonely character after the string is chopped from both ends.

The recursive matching works:

The whole string must be palindrome to pass, asserted by ^ and $. We cannot start matching from any random position.
If the string is <= 1 character, it passes.
If the string is > 2 characters, whether it is accepted is decided by the first part only. And it will be chopped by 2 ends if matches.
The leftover if matches, can only be chopped by the 2 ends, or passes if its length is <= 1.

114

answered Sep 19 '22 17:09

nhahtdh

The regex is essentially equivalent to the following pseudo-code:

palin(str) {
    if (length(str) >= 2) {
      first = str[0];
      last = str[length(str)-1];
      return first == last && palin(substr(str, 1, length(str)-2));
    } else
      // empty and single-char trivially palindromes
      return true;
}

answered Sep 20 '22 17:09

Barmar

Related questions
                            
                                convert age to current date
                            
                                How to return control from callback function or break the processing of array in middle array_filter processing
                            
                                Change content of div when clicking on link/button
                            
                                Best way to handle dirty state in an ORM model
                            
                                PHP, regex and multi-level curly brackets
                            
                                Overloading in php
                            
                                Propel, Add alias to select statement
                            
                                Display textarea format on echo from database
                            
                                Editing a PHP File, with another php file, using Fwrite
                            
                                PHP function array_replace(), why are the arguments passed by reference?
                            
                                unable to write to file with PHP cURL with curlopt_stderr and curlopt_file
                            
                                Amazon DynamoDB and relationship many-to-many
                            
                                Function Overloading in CodeIgniter
                            
                                how to differentiate between a domText and domElement object?
                            
                                How to get challengers in pyramid ranking system
                            
                                Install propel using composer
                            
                                E_NOTICE: How useful is it REALLY to fix every one?
                            
                                onclick form send via ajax no page refresh
                            
                                Singleton Pattern In PHP.... How Can I Save State Between Requests
                            
                                Send form after X seconds of not typing in specific field

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does regular expression engine parse regex with recursive subpatterns?

Tags:

regex

php

recursion

pcre

palindrome

alexy2k

People also ask

Video Answer

2 Answers

nhahtdh

Barmar

Recent Activity

Donate For Us