Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing an expression to recursively extract data between parenthesis

I'm trying to write a regular expression to split a string into separate elements inside matching curly braces. First off, it needs to be recursive, and second off, it has to return the offsets (like with PREG_OFFSET_CAPTURE).

I actually think this is probably a less efficient way to process this data, but I'm unsure of an easier, more performance driven technique. (If you've got one, I would love to hear it!)

So, the input can be in this format:

Hello {#name}! I'm a {%string|sentence|bit of {#random} text}

Processing the data is easy enough if it's in this format:

Hello {#name}! I'm a {%string|sentence|bit of random text}

But it's the recursive curly braces within another set of curly braces that is the problem when it comes to processing. I'm using the following code to split the string:

preg_match_all("/(?<={)[^}]*(?=})/m", $string, $braces, PREG_OFFSET_CAPTURE);

And as before mentioned, it's very nice for the simple form. Just less so for the more complicated form. The intention for this (and I have it functional in a non-recursive form) is to replace each parenthesized area with the content as processed by functions, working upwards.

Ideally, I'd like to be able to write Hello {#name}! I'm a {%string|sentence|bit of {?(random == "strange") ? {#random} : "strange"}} text} and for it to be manageable.

Any help would be very much appreciated.

like image 766
Forest Avatar asked Oct 31 '22 03:10

Forest


1 Answers

You can leverage PCRE regex power of capturing groups in look-aheads and subroutines to get the nested {...} substrings.

A regex demo is available here.

$re = "#(?=(\{(?>[^{}]|(?1))*+\}))#"; 
$str = "Hello {#name}! I'm a {%string|sentence|bit of {#random} text}"; 
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);

See IDEONE demo

It will return an array with the captured {...}-like strings and their positions:

Array
(
    [0] => Array
        (
            [0] => {#name}
            [1] => 6
        )

    [1] => Array
        (
            [0] => {%string|sentence|bit of {#random} text}
            [1] => 21
        )

    [2] => Array
        (
            [0] => {#random}
            [1] => 46
        )

)
like image 59
Wiktor Stribiżew Avatar answered Nov 15 '22 06:11

Wiktor Stribiżew