Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx Removing Methods from Code

Tags:

regex

With Regular Expressions I'm trying to remove all the methods/functions from the following code. Leaving the "global scope" alone. However, I can't manage to make it match for all the inner content of a method.

<?php
$mother = new Mother();
class Hello
{
    public function FunctionName($value="username",)
    {

    }
    public function ododeqwdo($value='')
    {
        # code...
    }
    public function ofdoeqdoq($value='')
    {
    if(isset($mother)) {
        echo $lol;
    }
    if(lol(9)) {
       echo 'lol';
    }
    }
}
function user()
{
    if(isset($mother)) {
        echo $lol;
    }
    if(lol(9)) {
       echo 'lol';
    }
}
    $mother->global();
function asodaosdo() {

}

The current Regular Expression I have is: (?:(public|protected|private|static)\s+)?function\s+\w+\(.*?\)\s+{.*?} However, it won't select a method that has brackets inside, like function user().

If someone could point me in the right direction.

like image 438
MarioRicalde Avatar asked Jan 23 '23 14:01

MarioRicalde


2 Answers

You can't do this properly with regex. You need to write a parser that can properly parse comments, string literals and nested brackets.

Regex cannot cope with these cases:

class Hello
{
  function foo()
  {
    echo '} <- that is not the closing bracket!';
    // and this: } bracket isn't the closing bracket either!
    /*
    } and that one isn't as well...
    */
  }
}

EDIT

Here's a little demo of how to use the tokenizer function mentioned by XUE Can:

$source = <<<BLOCK
<?php

\$mother = new Mother("this function isNotAFunction(\$x=0) {} foo bar");

class Hello
{
    \$foo = 666;

    public function FunctionName(\$value="username",)
    {

    }
    private \$bar;
    private function ododeqwdo(\$value='')
    {
        # code...
    }
    protected function ofdoeqdoq    (\$value='')
    {
        if(isset(\$mother)) {
            echo \$lol . 'function() {';
        }
        if(lol(9)) {
           echo 'lol';
        }
    }
}

function user()
{
    if(isset(\$mother)) {
        echo \$lol;
    }
    /* comment inside */
    if(lol(9)) {
       echo 'lol';
    }
}
/* comment to preserve function noFunction(){} */
\$mother->global();

function asodaosdo() {

}

?>
BLOCK;

if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} 
else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}

// Tokenize the source
$tokens = token_get_all($source);

// Some flags and counters
$tFunction = false;
$functionBracketBalance = 0;
$buffer = '';

// Iterate over all tokens
foreach ($tokens as $token) {
    // Single-character tokens.
    if(is_string($token)) {
        if(!$tFunction) {
            echo $token;
        }
        if($tFunction && $token == '{') {
            // Increase the bracket-counter (not the class-brackets: `$tFunction` must be true!)
            $functionBracketBalance++;
        }
        if($tFunction && $token == '}') {
            // Decrease the bracket-counter (not the class-brackets: `$tFunction` must be true!)
            $functionBracketBalance--;
            if($functionBracketBalance == 0) {
                // If it's the closing bracket of the function, reset `$tFunction`
                $tFunction = false;
            }
        }
    } 
    // Tokens consisting of (possibly) more than one character.
    else {
        list($id, $text) = $token;
        switch ($id) {
            case T_PUBLIC:
            case T_PROTECTED:
            case T_PRIVATE: 
                // Don'timmediately echo 'public', 'protected' or 'private'
                // before we know if it's part of a variable or method.
                $buffer = "$text ";
                break; 
            case T_WHITESPACE:
                // Only display spaces if we're outside a function.
                if(!$tFunction) echo $text;
                break;
            case T_FUNCTION:
                // If we encounter the keyword 'function', flip the `tFunction` flag to 
                // true and reset the `buffer` 
                $tFunction = true;
                $buffer = '';
                break;
            default:
                // Echo all other tokens if we're not in a function and prepend a possible 
                // 'public', 'protected' or 'private' previously put in the `buffer`.
                if(!$tFunction) {
                    echo "$buffer$text";
                    $buffer = '';
                }
       }
   }
}

which will print:

<?php

$mother = new Mother("this function isNotAFunction($x=0) {} foo bar");

class Hello
{
    $foo = 666;


     private $bar;


}


/* comment to preserve function noFunction(){} */
$mother->global();



?>

which is the original source, only without functions.

like image 56
Bart Kiers Avatar answered Jan 26 '23 00:01

Bart Kiers


I believe using PHP's built-in Tokenizer feature or Zend_CodeGenerator from Zend Framework is a more safe way. These will also keep your code more readble.

This is just because if you want use regexp to parse source codes, you must maintain your own tokens set but there is a built-in solution.

like image 20
XUE Can Avatar answered Jan 25 '23 23:01

XUE Can