Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP - valid variable names

Tags:

php

In the PHP manual on variables, we can read:

Variable names follow the same rules as other labels in PHP. A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'

So obviously when we try to run:

$0-a = 5;
echo $0-a;

we will get Parse error. This is quite obvious.

However when trying some things, what I found is that actually variables can contain any characters (or at least start with numbers and contain hyphens) when using such syntax:

${'0-a'} = 5;
echo ${'0-a'};

it works without any problems.

Also using variable variables like this:

$variable = '0-a';
$$variable = 5;
echo $$variable;

works without any problem.

So the question is - is that sentence I quote in manual is not true or maybe this what I showed is not real variable or maybe it's documented somewhere else in PHP manual?

I've verified it - and it seems to work both in PHP 5.6 and 7.1

Also the question is - is it safe to use such constructions? Based on manual it seems it shouldn't be possible at all.

like image 566
Marcin Nabiałek Avatar asked Feb 24 '17 18:02

Marcin Nabiałek


People also ask

Which variables names are valid?

A valid variable name starts with a letter, followed by letters, digits, or underscores.

What are the PHP variables?

PHP VARIABLES: A variable in PHP is a name of memory location that holds data. In PHP, a variable is declared using $ sign followed by variable name. The main way to store information in the middle of a PHP program is by using a variable.

What are the 4 rules for variable names?

A variable name must start with a letter or an underscore character (_) A variable name cannot start with a digit. A variable name can only contain alpha-numeric characters and underscores ( a-z, A-Z , 0-9 , and _ ) Variable names are case-sensitive (age, Age and AGE are three different variables)

What does ?: Mean in PHP?

The Scope Resolution Operator (also called Paamayim Nekudotayim) or in simpler terms, the double colon, is a token that allows access to static, constant, and overridden properties or methods of a class.


1 Answers

You can literally choose any name for a variable. "i" and "foo" are obvious choices, but "", "\n", and "foo.bar" are also valid. The reason? The PHP symbol table is just a dictionary: a string key of zero or more bytes maps to a structured value (called a zval). Interestingly, there are two ways to access this symbol table: lexical variables and dynamic variables.

Lexical variables are what you read about in the "variables" documentation. Lexical variables define the symbol table key during compilation (ie, while the engine is lexing and parsing the code). To keep this lexer simple, lexical variables start with a $ sigil and must match the regex [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*. Keeping it simple this way means the parser doesn't have to figure out, for example, whether $foo.bar is a variable keyed by "foo.bar" or a variable "foo" string concatenated with a constant bar.

Now dynamic variables is where it gets interesting. Dynamic variables let you access those more uncommon variable names. PHP calls these variable variables. (I'm not fond of that name, as their opposite is logically "constant variable", which is confusing. But I'll call them variable variables here on.) The basic usage goes like:

$a = 'b';
$b = 'SURPRISE!';
var_dump($$a, ${$a}); // both emit a surprise

Variable variables are parsed differently than lexical variables. Rather than defining the symbol table key at lexing time, the symbol table key is evaluated at run time. The logic goes like this: the PHP lexer sees the variable variable syntax (either $$a or more generally ${expression}), the PHP parser defers evaluation of the expression until at run-time, then at run-time the engine uses the result of the expression to key into the symbol table. It's a little more work than lexical variables, but far more powerful.

Inside of ${} you can have an expression that evaluates to any byte sequence. Empty string, null byte, all of it. Anything goes. That is handy, for example, in heredocs. It's also handy for accessing remote variables as PHP variables. For example, JSON allows any character in a key name, and you might want to access those as straight variables (rather than array elements):

$decoded = json_decode('{ "foo.bar" : 1 }');
foreach ($decoded as $key => $value) {
    ${$key} = $value;
}
var_dump(${'foo.bar'});

Using variable variables in this way is similar to using an array as a "symbol table", like $array['foo.bar'], but the variable variable approach is perfectly acceptable and slightly faster.


Addendum

By "slightly faster" we are talking so far to the right of the decimal point that they're practically indistinguishable. It's not until 10^8 symbol accesses that the difference exceeds 1 second in my tests.

Set array key: 0.000000119529
Set var-var:   0.000000101196
Increment array key: 0.000000159856
Increment var-var:   0.000000136778

The loss of clarity and convention is likely not worth it.

$N = 100000000;

$elapsed = -microtime(true);
$syms = [];
for ($i = 0; $i < $N; $i++) { $syms['foo.bar'] = 1; }
printf("Set array key: %.12f\n", ($elapsed + microtime(true)) / $N);

$elapsed = -microtime(true);
for ($i = 0; $i < $N; $i++) { ${'foo.bar'} = 1; }
printf("Set var-var:   %.12f\n", ($elapsed + microtime(true)) / $N);

$elapsed = -microtime(true);
$syms['foo.bar'] = 1;
for ($i = 0; $i < $N; $i++) { $syms['foo.bar']++; }
printf("Increment array key: %.12f\n", ($elapsed + microtime(true)) / $N);

$elapsed = -microtime(true);
${'foo.bar'} = 1;
for ($i = 0; $i < $N; $i++) { ${'foo.bar'}++; }
printf("Increment var-var:   %.12f\n", ($elapsed + microtime(true)) / $N);
like image 140
bishop Avatar answered Oct 21 '22 11:10

bishop