Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use regex to catch unquoted array indices in PHP code and quote them?

Tags:

regex

php

PHP 7.2 upgraded undefined constant errors from a notice to a warning, with advice that in future they will return a full-on error instead.

I am trying to identify a way to fix these via scripting, ideally via a regex that I can run to parse each PHP file on a site, find all offending bits of code, and fix them.

I've found multiple examples of how to fix one variant, but none for another, and it's that one that I'm looking for help with.

Here's an example file:

<?php

$array[foo] = "bar"; 
// this should become 
// $array['foo'] = "bar"

echo "hello, my name is $array[foo] and it's nice to meet you"; 
// would need to become 
// echo "hello, my name is " . $array['foo'] . " and it's nice to meet you";

?>

I've seen a lot of options to identify and change the first type, but none for the second, where the undefined constant is within a string. In that instance the parser would need to:

  1. Replace $array[foo] with $array['foo']
  2. Find the entire variable, end quotes beforehand, put a . either side, and then reopen quotes afterwards

Edit: ideally one regexp would deal with both examples in the sample code in one pass - i.e. add the ticks, and also add the quotes/dots if it identifies it’s within a string.

like image 670
David Bennett Avatar asked May 03 '19 13:05

David Bennett


1 Answers

$array[foo] = "bar"; 
// this should become 
// $array['foo'] = "bar"

Yes, this has always triggered a notice and has always been poor practice.

echo "hello, my name is $array[foo] and it's nice to meet you"; 
// would need to become 
// echo "hello, my name is " . $array['foo'] . " and it's nice to meet you";

No, this style has never triggered a notice and does not now. In fact, it's used as an example in the PHP documentation. PHP is never going to remove the ability to interpolate array variables in strings.


Your first case is easy enough to catch with something like this:

$str = '$array[foo] = "bar";';
echo preg_replace("/(\\$[a-z_][a-z0-9_]*)\\[([a-z][a-z0-9_]*)\\]/", "$1['$2']", $str);

But of course needs to be caught only outside of a string.

As with any complex grammar, regular expressions will never be as reliable as a grammar-specific parser. Since you're parsing PHP code, your most accurate solution will be to use PHP's own token parser.

$php = <<< 'PHP'
<?php
$array[foo] = "bar"; // this line should be the only one altered.
$array['bar'] = "baz";
echo "I'm using \"$array[foo]\" and \"$array[bar]\" in a sentence";
echo 'Now I\'m not using "$array[foo]" and "$array[bar]" in a sentence';
PHP;

$tokens = token_get_all($php);
$in_dq_string = false;
$last_token = null;
$output = "";

foreach ($tokens as $token) {
    if ($last_token === "[" && is_array($token) && $token[0] === 319 && !$in_dq_string) {
        $output .= "'$token[1]'";
    } elseif (is_array($token)) {
        $output .= $token[1];
    } else {
        if ($token === "\"") {
            $in_dq_string = !$in_dq_string;
        }
        $output .= $token;
    }
    $last_token = $token;
}

echo $output;

Output:

<?php
$array['foo'] = "bar"; // this line should be the only one altered.
$array['bar'] = "baz";
echo "I'm using \"$array[foo]\" and \"$array[bar]\" in a sentence";
echo 'Now I\'m not using "$array[foo]" and "$array[bar]" in a sentence';

This code would need some edge cases accounted for, such as when you are intentionally using a constant as an array index.

like image 69
miken32 Avatar answered Oct 11 '22 01:10

miken32