Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested placeholders replacing in PHP

Tags:

regex

php

I have strings with placeholders like "{variant 1|variant 2}", where "|" means "or"; I want to get all variants of strings without placeholders. For example, if I use string "{a|b{c|d}}", I get strings "a", "bc" and "bd". I tried to get it by regular expression \{([^{}])\} (it gets last level, in my case {c|d}) with recursion, but I have two strings on next step: {a|bc} and {a|bd}, which will produce "a", "bc", "a", "bd". May be I need to create some graph or tree structure? I also want to ask about (?[^{}|$]*) Why there are "$"? I removed it, and have no effect.

like image 919
Guy Fawkes Avatar asked Jun 26 '26 00:06

Guy Fawkes


1 Answers

Assuming that |{} are reserved characters (not allowed as content of your variants, the following is a regex approach to the problem. Please note, that writing a simple state machine parser would be the better choice.

<?php // Using PHP5.3 syntax

// PCRE Recursive Pattern
// http://php.net/manual/en/regexp.reference.recursive.php

$string = "This test can be {very {cool|bad} in random order|or be just text} ddd {a|b{c|d}} bar {a|b{c{d|e|f}}} lala {b|c} baz";

if (preg_match_all('#\{((?>[^{}]+)|(?R))+\}#', $string, $matches, PREG_SET_ORDER)) {
    foreach ($matches as $match) {
        // $match[0] == "{a|b{c|d}}" | "{a|b{c{d|e|f}}}" | "{b|c}"
        // have some fun splitting them up
        // I'd suggest walking the characters and building a tree
        // a simpler (slower, uglyer) approach:

        // remove {}
        $set = substr($match[0], 1, -1);
        while (strpos($set, '{') !== false) {
            // explode and replace nested {}
            // reserved characters: "{" and "}" and "|"
            // (?<=^|\{|\|) -- a substring needs to begin with "|" or "{" or be the start of the string,
            //  "?<=" is a positive look behind assertion - the content is not captured
            // (?<prefix>[^{|]+) -- is the prefix, preceeding literal string (anything but reserved characters)
            // \{(?<inner>[^{}]+)\} -- is the content of a nested {} group, excluding the "{" and "}"
            // (?<postfix>[^|}$]*) -- is the postfix, trailing literal string (anything but reserved characters)
            // readable: <begin-delimiter><possible-prefix>{<nested-group>}<possible-postfix>
            $set = preg_replace_callback('#(?<=^|\{|\|)(?<prefix>[^{}|]*)\{(?<inner>[^{}]+)\}(?<postfix>[^{}|$]*)#', function($m) {
                $inner = explode('|', $m['inner']);
                return $m['prefix'] . join($inner, $m['postfix'] . '|' . $m['prefix']) . $m['postfix'];
            }, $set);
        }

        // $items = explode('|', $set);
        echo "$match[0] expands to {{$set}}\n";
    }
}

/*
    OUTPUT:
    {very {cool|bad} in random order|or be just text} expands to {very cool in random order|very bad in random order|or be just text}
    {a|b{c|d}} expands to {a|bc|bd}
    {a|b{c{d|e|f}}} expands to {a|bcd|bce|bcf}
    {b|c} expands to {b|c}
*/
like image 192
rodneyrehm Avatar answered Jun 28 '26 13:06

rodneyrehm



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!