Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching (pairing) tokens (eg, brackets or quotes)

In short, I need a function which attempts a rudimentary code fix by adding brackets/quotes were necessary, for parsing purposes. That is, the resulting code is not expected to be runnable.

Let's see a few examples:

[1] class Aaa { $var a = "hi";       =>  class Aaa { $var a = "hi"; }
[2] $var a = "hi"; }                 =>  { $var a = "hi"; }
[3] class { a = "hi; function b( }   =>  class { a = "hi; function b( }"}
[4] class { a = "hi"; function b( }  =>  class { a = "hi"; function b() {}}

PS: The 4th example above looks quite complicated, but in fact, it's quite easy. If the engine finds an ending bracket token which doesn't match with the stack, it should the opposite token before that one. As you can see, this works pretty well.


As a function signature, it looks like: balanceTokens($code, $bracket_tokens, $quote_tokens)

The function I wrote works using a stack. Well, it doesn't exactly work, but it does use a stack.

function balanceTokens($code, $bracket_tokens, $quote_tokens){
    $stack = array(); $last = null; $result = '';
    foreach(str_split($code) as $c){
        if($last==$c && in_array($c, $quote_tokens)){
            // handle closing string
            array_pop($stack);
        }elseif(!in_array($last, $quote_tokens)){
            // handle other tokens
            if(isset($bracket_tokens[$c])){
                // handle begining bracket
                $stack[] = $c;
            }elseif(($p = array_search($c, $bracket_tokens)) != false){
                // handle ending bracket
                $l = array_pop($stack);   
                if($l != $p)$result .= $p;
            }elseif(isset($quote_tokens[$c])){
                // handle begining quote
                $stack[] = $c;
                $last = $c;
            }// else other token...
        }
        $result .= $c;
    }
    // perform fixes
    foreach($stack as $token){
        // fix ending brackets
        if(isset($bracket_tokens[$token]))
            $result .= $bracket_tokens[$token];
        // fix begining brackets
        if(in_array($token, $bracket_tokens))
            $result = $token . $result;
    }
    return $result;
}

The function is called like this:

$new_code = balanceTokens(
    $old_code,
    array(
        '<' => '>',
        '{' => '}',
        '(' => ')',
        '[' => ']',
    ),
    array(
        '"' => '"',
        "'" => "'",
    )
);

Yes, it's quite generic, there aren't any hard-coded tokens.

I haven't the slightest idea why it's not working...as a matter of fact, I don't even know if it should work. I admit I didn't put much thought into writing it. Maybe there are obvious issues which I'm not seeing.

like image 512
Christian Avatar asked Dec 11 '25 20:12

Christian


1 Answers

An alternative implementation (which does more aggressive balancing):

function balanceTokens($code) {
    $tokens = [
        '{' => '}',
        '[' => ']',
        '(' => ')',
        '"' => '"',
        "'" => "'",
    ];
    $closeTokens = array_flip($tokens);
    $stringTokens = ['"' => true, '"' => true];

    $stack = [];
    for ($i = 0, $l = strlen($code); $i < $l; ++$i) {
        $c = $code[$i];

        // push opening tokens to the stack (for " and ' only if there is no " or ' opened yet)
        if (isset($tokens[$c]) && (!isset($stringTokens[$c]) || end($stack) != $c)) {
            $stack[] = $c;
        // closing tokens have to be matched up with the stack elements
        } elseif (isset($closeTokens[$c])) {
            $matched = false;

            while ($top = array_pop($stack)) {
                // stack has matching opening for current closing
                if ($top == $closeTokens[$c]) {
                    $matched = true;
                    break;
                }

                // stack has unmatched opening, insert closing at current pos
                $code = substr_replace($code, $tokens[$top], $i, 0);
                $i++;
                $l++;
            }

            // unmatched closing, insert opening at start
            if (!$matched) {
                $code = $closeTokens[$c] . $code;
                $i++;
                $l++;
            }
        }
    }

    // any elements still on the stack are unmatched opening, so insert closing
    while ($top = array_pop($stack)) {
        $code .= $tokens[$top];
    }

    return $code;
}

Some examples:

$tests = array(
    'class Aaa { public $a = "hi";',
    '$var = "hi"; }',
    'class { a = "hi; function b( }',
    'class { a = "hi"; function b( }',
    'foo { bar[foo="test',
    'bar { bar[foo="test] { bar: "rgba(0, 0, 0, 0.1}',
);

Passing those to the function gives:

class Aaa { public $a = "hi";}
{$var = "hi"; }
class { a = "hi; function b( )"}
class { a = "hi"; function b( )}
foo { bar[foo="test"]}
bar { bar[foo="test"] { bar: "rgba(0, 0, 0, 0.1)"}}
like image 128
NikiC Avatar answered Dec 14 '25 09:12

NikiC