I have php code stored (( array definition )) in a string like this
$code=' array(
0 => "a",
"a" => $GlobalScopeVar,
"b" => array("nested"=>array(1,2,3)),
"c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
there is a regular expression to extract this array??, i mean i want something like
$array=(
0 => '"a"',
'a' => '$GlobalScopeVar',
'b' => 'array("nested"=>array(1,2,3))',
'c' => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
);
pD :: i do research trying to find a regular expression but nothing was found.
pD2 :: gods of stackoverflow, let me bounty this now and i will offer 400 :3
pD3 :: this will be used in a internal app, where i need extract an array of some php file to be 'processed' in parts, i try explain with this codepad.org/td6LVVme
So here's the MEGA regex I came up with:
\s* # white spaces
########################## KEYS START ##########################
(?: # We\'ll use this to make keys optional
(?P<keys> # named group: keys
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
) # close group: keys
########################## KEYS END ##########################
\s* # white spaces
=> # match =>
)? # make keys optional
\s* # white spaces
########################## VALUES START ##########################
(?P<values> # named group: values
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
| # or
array\s*\((?:[^()]|(?R))*\) # match an array()
| # or
\[(?:[^[\]]|(?R))*\] # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
| # or
(?:function\s+)?\w+\s* # match functions: helloWorld, function name
(?:\((?:[^()]|(?R))*\)) # match function parameters (wut), (), (array(1,2,4))
(?:(?:\s*use\s*\((?:[^()]|(?R))*\)\s*)? # match use(&$var), use($foo, $bar) (optionally)
\{(?:[^{}]|(?R))*\} # match { whatever}
)?;? # match ; (optionally)
) # close group: values
########################## VALUES END ##########################
\s* # white spaces
I've put some comments, note that you need to use 3 modifiers:x
: let's me make comments
s
: match newlines with dots
i
: match case insensitive
$code='array(0 => "a", 123 => 123, $_POST["hello"][\'world\'] => array("is", "actually", "An array !"), 1234, \'got problem ?\',
"a" => $GlobalScopeVar, $test_further => function test($noway){echo "this works too !!!";}, "yellow" => "blue",
"b" => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3)), "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
"bug", "fixed", "mwahahahaa" => "Yeaaaah"
);'; // Sample data
$code = preg_replace('#(^\s*array\s*\(\s*)|(\s*\)\s*;?\s*$)#s', '', $code); // Just to get ride of array( at the beginning, and ); at the end
preg_match_all('~
\s* # white spaces
########################## KEYS START ##########################
(?: # We\'ll use this to make keys optional
(?P<keys> # named group: keys
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
) # close group: keys
########################## KEYS END ##########################
\s* # white spaces
=> # match =>
)? # make keys optional
\s* # white spaces
########################## VALUES START ##########################
(?P<values> # named group: values
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
| # or
array\s*\((?:[^()]|(?R))*\) # match an array()
| # or
\[(?:[^[\]]|(?R))*\] # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
| # or
(?:function\s+)?\w+\s* # match functions: helloWorld, function name
(?:\((?:[^()]|(?R))*\)) # match function parameters (wut), (), (array(1,2,4))
(?:(?:\s*use\s*\((?:[^()]|(?R))*\)\s*)? # match use(&$var), use($foo, $bar) (optionally)
\{(?:[^{}]|(?R))*\} # match { whatever}
)?;? # match ; (optionally)
) # close group: values
########################## VALUES END ##########################
\s* # white spaces
~xsi', $code, $m); // Matching :p
print_r($m['keys']); // Print keys
print_r($m['values']); // Print values
// Since some keys may be empty in case you didn't specify them in the array, let's fill them up !
foreach($m['keys'] as $index => &$key){
if($key === ''){
$key = 'made_up_index_'.$index;
}
}
$results = array_combine($m['keys'], $m['values']);
print_r($results); // printing results
Array
(
[0] => 0
[1] => 123
[2] => $_POST["hello"]['world']
[3] =>
[4] =>
[5] => "a"
[6] => $test_further
[7] => "yellow"
[8] => "b"
[9] => "c"
[10] =>
[11] =>
[12] => "mwahahahaa"
[13] => "this is"
)
Array
(
[0] => "a"
[1] => 123
[2] => array("is", "actually", "An array !")
[3] => 1234
[4] => 'got problem ?'
[5] => $GlobalScopeVar
[6] => function test($noway){echo "this works too !!!";}
[7] => "blue"
[8] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
[9] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[10] => "bug"
[11] => "fixed"
[12] => "Yeaaaah"
[13] => "a test"
)
Array
(
[0] => "a"
[123] => 123
[$_POST["hello"]['world']] => array("is", "actually", "An array !")
[made_up_index_3] => 1234
[made_up_index_4] => 'got problem ?'
["a"] => $GlobalScopeVar
[$test_further] => function test($noway){echo "this works too !!!";}
["yellow"] => "blue"
["b"] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[made_up_index_10] => "bug"
[made_up_index_11] => "fixed"
["mwahahahaa"] => "Yeaaaah"
["this is"] => "a test"
)
Online regex demo Online php demo
$code='array("aaa", "sdsd" => "dsdsd");'; // fail
$code='array(\'aaa\', \'sdsd\' => "dsdsd");'; // fail
$code='array("aaa", \'sdsd\' => "dsdsd");'; // succeed
// Which means, if a value with no keys is followed
// by key => value and they are using the same quotation
// then it will fail (first value gets merged with the key)
Online bug demo
Goes to Bart Kiers for his recursive pattern to match nested brackets.
You maybe should go with a parser since regexes are sensitive. @bwoebi has done a great job in his answer.
Even when you asked for a regex, it works also with pure PHP. token_get_all
is here the key function. For a regex check @HamZa's answer out.
The advantage here is that it is more dynamic than a regex. A regex has a static pattern, while with token_get_all, you can decide after every single token what to do. It even escapes single quotes and backslashes where necessary, what a regex wouldn't do.
Also, in regex, you have, even when commented, problems to imagine what it should do; what code does is much easier to understand when you look at PHP code.
$code = ' array(
0 => "a",
"a" => $GlobalScopeVar,
"b" => array("nested"=>array(1,2,3)),
"c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
"string_literal",
12345
); ';
$token = token_get_all("<?php ".$code);
$newcode = "";
$i = 0;
while (++$i < count($token)) { // enter into array; then start.
if (is_array($token[$i]))
$newcode .= $token[$i][1];
else
$newcode .= $token[$i];
if ($token[$i] == "(") {
$ending = ")";
break;
}
if ($token[$i] == "[") {
$ending = "]";
break;
}
}
// init variables
$escape = 0;
$wait_for_non_whitespace = 0;
$parenthesis_count = 0;
$entry = "";
// main loop
while (++$i < count($token)) {
// don't match commas in func($a, $b)
if ($token[$i] == "(" || $token[$i] == "{") // ( -> normal parenthesis; { -> closures
$parenthesis_count++;
if ($token[$i] == ")" || $token[$i] == "}")
$parenthesis_count--;
// begin new string after T_DOUBLE_ARROW
if (!$escape && $wait_for_non_whitespace && (!is_array($token[$i]) || $token[$i][0] != T_WHITESPACE)) {
$escape = 1;
$wait_for_non_whitespace = 0;
$entry .= "'";
}
// here is a T_DOUBLE_ARROW, there will be a string after this
if (is_array($token[$i]) && $token[$i][0] == T_DOUBLE_ARROW && !$escape) {
$wait_for_non_whitespace = 1;
}
// entry ended: comma reached
if (!$parenthesis_count && $token[$i] == "," || ($parenthesis_count == -1 && $token[$i] == ")" && $ending == ")") || ($ending == "]" && $token[$i] == "]")) {
// go back to the first non-whitespace
$whitespaces = "";
if ($parenthesis_count == -1 || ($ending == "]" && $token[$i] == "]")) {
$cut_at = strlen($entry);
while ($cut_at && ord($entry[--$cut_at]) <= 0x20); // 0x20 == " "
$whitespaces = substr($entry, $cut_at + 1, strlen($entry));
$entry = substr($entry, 0, $cut_at + 1);
}
// $escape == true means: there was somewhere a T_DOUBLE_ARROW
if ($escape) {
$escape = 0;
$newcode .= $entry."'";
} else {
$newcode .= "'".addcslashes($entry, "'\\")."'";
}
$newcode .= $whitespaces.($parenthesis_count?")":(($ending == "]" && $token[$i] == "]")?"]":","));
// reset
$entry = "";
} else {
// add actual token to $entry
if (is_array($token[$i])) {
$addChar = $token[$i][1];
} else {
$addChar = $token[$i];
}
if ($entry == "" && $token[$i][0] == T_WHITESPACE) {
$newcode .= $addChar;
} else {
$entry .= $escape?str_replace(array("'", "\\"), array("\\'", "\\\\"), $addChar):$addChar;
}
}
}
//append remaining chars like whitespaces or ;
$newcode .= $entry;
print $newcode;
Demo at: http://3v4l.org/qe4Q1
Should output:
array(
0 => '"a"',
"a" => '$GlobalScopeVar',
"b" => 'array("nested"=>array(1,2,3))',
"c" => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
'"string_literal"',
'12345'
)
You can, to get the array's data, print_r(eval("return $newcode;"));
to get the entries of the array:
Array
(
[0] => "a"
[a] => $GlobalScopeVar
[b] => array("nested"=>array(1,2,3))
[c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[1] => "string_literal"
[2] => 12345
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With