Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to test for a variable's existence in PHP; isset() is clearly broken

If the variable you are checking would be in the global scope you could do:

array_key_exists('v', $GLOBALS) 

Attempting to give an overview of the various discussions and answers:

There is no single answer to the question which can replace all the ways isset can be used. Some use cases are addressed by other functions, while others do not stand up to scrutiny, or have dubious value beyond code golf. Far from being "broken" or "inconsistent", other use cases demonstrate why isset's reaction to null is the logical behaviour.

Real use cases (with solutions)

1. Array keys

Arrays can be treated like collections of variables, with unset and isset treating them as though they were. However, since they can be iterated, counted, etc, a missing value is not the same as one whose value is null.

The answer in this case, is to use array_key_exists() instead of isset().

Since this is takes the array to check as a function argument, PHP will still raise "notices" if the array itself doesn't exist. In some cases, it can validly be argued that each dimension should have been initialised first, so the notice is doing its job. For other cases, a "recursive" array_key_exists function, which checked each dimension of the array in turn, would avoid this, but would basically be the same as @array_key_exists. It is also somewhat tangential to the handling of null values.

2. Object properties

In the traditional theory of "Object-Oriented Programming", encapsulation and polymorphism are key properties of objects; in a class-based OOP implementation like PHP's, the encapsulated properties are declared as part of the class definition, and given access levels (public, protected, or private).

However, PHP also allows you to dynamically add properties to an object, like you would keys to an array, and some people use class-less objects (technically, instances of the built in stdClass, which has no methods or private functionality) in a similar way to associative arrays. This leads to situations where a function may want to know if a particular property has been added to the object given to it.

As with array keys, a solution for checking object properties is included in the language, called, reasonably enough, property_exists.

Non-justifiable use cases, with discussion

3. register_globals, and other pollution of the global namespace

The register_globals feature added variables to the global scope whose names were determined by aspects of the HTTP request (GET and POST parameters, and cookies). This can lead to buggy and insecure code, which is why it has been disabled by default since PHP 4.2, released Aug 2000 and removed completely in PHP 5.4, released Mar 2012. However, it's possible that some systems are still running with this feature enabled or emulated. It's also possible to "pollute" the global namespace in other ways, using the global keyword, or $GLOBALS array.

Firstly, register_globals itself is unlikely to unexpectedly produce a null variable, since the GET, POST, and cookie values will always be strings (with '' still returning true from isset), and variables in the session should be entirely under the programmer's control.

Secondly, pollution of a variable with the value null is only an issue if this over-writes some previous initialization. "Over-writing" an uninitialized variable with null would only be problematic if code somewhere else was distinguishing between the two states, so on its own this possibility is an argument against making such a distinction.

4. get_defined_vars and compact

A few rarely-used functions in PHP, such as get_defined_vars and compact, allow you to treat variable names as though they were keys in an array. For global variables, the super-global array $GLOBALS allows similar access, and is more common. These methods of access will behave differently if a variable is not defined in the relevant scope.

Once you've decided to treat a set of variables as an array using one of these mechanisms, you can do all the same operations on it as on any normal array. Consequently, see 1.

Functionality that existed only to predict how these functions are about to behave (e.g. "will there be a key 'foo' in the array returned by get_defined_vars?") is superfluous, since you can simply run the function and find out with no ill effects.

4a. Variable variables ($$foo)

While not quite the same as functions which turn a set of variables into an associative array, most cases using "variable variables" ("assign to a variable named based on this other variable") can and should be changed to use an associative array instead.

A variable name, fundamentally, is the label given to a value by the programmer; if you're determining it at run-time, it's not really a label but a key in some key-value store. More practically, by not using an array, you are losing the ability to count, iterate, etc; it can also become impossible to have a variable "outside" the key-value store, since it might be over-written by $$foo.

Once changed to use an associative array, the code will be amenable to solution 1. Indirect object property access (e.g. $foo->$property_name) can be addressed with solution 2.

5. isset is so much easier to type than array_key_exists

I'm not sure this is really relevant, but yes, PHP's function names can be pretty long-winded and inconsistent sometimes. Apparently, pre-historic versions of PHP used a function name's length as a hash key, so Rasmus deliberately made up function names like htmlspecialchars so they would have an unusual number of characters...

Still, at least we're not writing Java, eh? ;)

6. Uninitialized variables have a type

The manual page on variable basics includes this statement:

Uninitialized variables have a default value of their type depending on the context in which they are used

I'm not sure whether there is some notion in the Zend Engine of "uninitialized but known type" or whether this is reading too much into the statement.

What is clear is that it makes no practical difference to their behaviour, since the behaviours described on that page for uninitialized variables are identical to the behaviour of a variable whose value is null. To pick one example, both $a and $b in this code will end up as the integer 42:

unset($a);
$a += 42;

$b = null;
$b += 42;

(The first will raise a notice about an undeclared variable, in an attempt to make you write better code, but it won't make any difference to how the code actually runs.)

99. Detecting if a function has run

(Keeping this one last, as it's much longer than the others. Maybe I'll edit it down later...)

Consider the following code:

$test_value = 'hello';
foreach ( $list_of_things as $thing ) {
    if ( some_test($thing, $test_value) ) {
        $result = some_function($thing);
    }
}
if ( isset($result) ) {
    echo 'The test passed at least once!';
}

If some_function can return null, there's a possibility that the echo won't be reached even though some_test returned true. The programmer's intention was to detect when $result had never been set, but PHP does not allow them to do so.

However, there are other problems with this approach, which become clear if you add an outer loop:

foreach ( $list_of_tests as $test_value ) {
    // something's missing here...
    foreach ( $list_of_things as $thing ) {
        if ( some_test($thing, $test_value) ) {
            $result = some_function($thing);
        }
    }
    if ( isset($result) ) {
        echo 'The test passed at least once!';
    }
}

Because $result is never initialized explicitly, it will take on a value when the very first test passes, making it impossible to tell whether subsequent tests passed or not. This is actually an extremely common bug when variables aren't initialised properly.

To fix this, we need to do something on the line where I've commented that something's missing. The most obvious solution is to set $result to a "terminal value" that some_function can never return; if this is null, then the rest of the code will work fine. If there is no natural candidate for a terminal value because some_function has an extremely unpredictable return type (which is probably a bad sign in itself), then an additional boolean value, e.g. $found, could be used instead.

Thought experiment one: the very_null constant

PHP could theoretically provide a special constant - as well as null - for use as a terminal value here; presumably, it would be illegal to return this from a function, or it would be coerced to null, and the same would probably apply to passing it in as a function argument. That would make this very specific case slightly simpler, but as soon as you decided to re-factor the code - for instance, to put the inner loop into a separate function - it would become useless. If the constant could be passed between functions, you could not guarantee that some_function would not return it, so it would no longer be useful as a universal terminal value.

The argument for detecting uninitialised variables in this case boils down to the argument for that special constant: if you replace the comment with unset($result), and treat that differently from $result = null, you are introducing a "value" for $result that cannot be passed around, and can only be detected by specific built-in functions.

Thought experiment two: assignment counter

Another way of thinking about what the last if is asking is "has anything made an assignment to $result?" Rather than considering it to be a special value of $result, you could maybe think of this as "metadata" about the variable, a bit like Perl's "variable tainting". So rather than isset you might call it has_been_assigned_to, and rather than unset, reset_assignment_state.

But if so, why stop at a boolean? What if you want to know how many times the test passed; you could simply extend your metadata to an integer and have get_assignment_count and reset_assignment_count...

Obviously, adding such a feature would have a trade-off in complexity and performance of the language, so it would need to be carefully weighed against its expected usefulness. As with a very_null constant, it would be useful only in very narrow circumstances, and would be similarly resistant to re-factoring.

The hopefully-obvious question is why the PHP runtime engine should assume in advance that you want to keep track of such things, rather than leaving you to do it explicitly, using normal code.


Sometimes I get a little lost trying to figure out which comparison operation to use in a given situation. isset() only applies to uninitialized or explicitly null values. Passing/assigning null is a great way to ensure a logical comparison works as expected.

Still, it's a little difficult to think about so here's a simple matrix comparing how different values will be evaluated by different operations:

|           | ===null | is_null | isset | empty | if/else | ternary | count>0 |
| -----     | -----   | -----   | ----- | ----- | -----   | -----   | -----   |
| $a;       | true    | true    |       | true  |         |         |         |
| null      | true    | true    |       | true  |         |         |         |
| []        |         |         | true  | true  |         |         |         |
| 0         |         |         | true  | true  |         |         | true    |
| ""        |         |         | true  | true  |         |         | true    |
| 1         |         |         | true  |       | true    | true    | true    |
| -1        |         |         | true  |       | true    | true    | true    |
| " "       |         |         | true  |       | true    | true    | true    |
| "str"     |         |         | true  |       | true    | true    | true    |
| [0,1]     |         |         | true  |       | true    | true    | true    |
| new Class |         |         | true  |       | true    | true    | true    |

To fit the table I compressed the labels a bit:

  • $a; refers to a declared but unassigned variable
  • everything else in the first column refers to an assigned value, like:
    • $a = null;
    • $a = [];
    • $a = 0;
  • the columns refer to comparison operations, like:
    • $a === null
    • isset($a)
    • empty($a)
    • $a ? true : false

All results are boolean, true is printed and false is omitted.

You can run the tests yourself, check this gist:
https://gist.github.com/mfdj/8165967


You can use the compact language construct to test for the existence of a null variable. Variables that do not exist will not turn up in the result, while null values will show.

$x = null;
$y = 'y';

$r = compact('x', 'y', 'z');
print_r($r);

// Output:
// Array ( 
//  [x] => 
//  [y] => y 
// ) 

In the case of your example:

if (compact('v')) {
   // True if $v exists, even when null. 
   // False on var $v; without assignment and when $v does not exist.
}

Of course for variables in global scope you can also use array_key_exists().

B.t.w. personally I would avoid situations like the plague where there is a semantic difference between a variable not existing and the variable having a null value. PHP and most other languages just does not think there is.


Explaining NULL, logically thinking

I guess the obvious answer to all of this is... Don't initialise your variables as NULL, initalise them as something relevant to what they are intended to become.

Treat NULL properly

NULL should be treated as "non-existant value", which is the meaning of NULL. The variable can't be classed as existing to PHP because it hasn't been told what type of entity it is trying to be. It may aswell not exist, so PHP just says "Fine, it doesn't because there's no point to it anyway and NULL is my way of saying this".

An argument

Let's argue now. "But NULL is like saying 0 or FALSE or ''.

Wrong, 0-FALSE-'' are all still classed as empty values, but they ARE specified as some type of value or pre-determined answer to a question. FALSE is the answer to yes or no,'' is the answer to the title someone submitted, and 0 is the answer to quantity or time etc. They ARE set as some type of answer/result which makes them valid as being set.

NULL is just no answer what so ever, it doesn't tell us yes or no and it doesn't tell us the time and it doesn't tell us a blank string got submitted. That's the basic logic in understanding NULL.

Summary

It's not about creating wacky functions to get around the problem, it's just changing the way your brain looks at NULL. If it's NULL, assume it's not set as anything. If you are pre-defining variables then pre-define them as 0, FALSE or "" depending on the type of use you intend for them.

Feel free to quote this. It's off the top of my logical head :)