Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Copy On Write and Assign By Reference perform different on PHP5 and PHP7

We have a piece of simple code:

1    <?php
2    $i = 2;
3    $j = &$i;
4    echo (++$i) + (++$i);

On PHP5, it outputs 8, because:

$i is a reference, when we increase $i by ++i, it will change the zval rather than make a copy, so line 4 will be 4 + 4 = 8. This is Assign By Reference.

If we comment line 3, it will output 7, every time we change the value by increasing it, PHP will make a copy, line 4 will be 3 + 4 = 7. This is Copy On Write.

But in PHP7, it always outputs 7.

I've checked the changes in PHP7: http://php.net/manual/en/migration70.incompatible.php, but I did not get any clue.

Any help will be great, thanks in advance.

update1

Here is the result of the code on PHP5 / PHP7: https://3v4l.org/USTHR

update2

The opcode:

[huqiu@101 tmp]$ php -d vld.active=1 -d vld.execute=0 -f incr-ref-add.php
Finding entry points
Branch analysis from position: 0
Jump found. Position 1 = -2
filename:       /home/huqiu/tmp/incr-ref-add.php
function name:  (null)
number of ops:  7
compiled vars:  !0 = $i, !1 = $j
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   2     0  E >   ASSIGN                                                   !0, 2
   3     1        ASSIGN_REF                                               !1, !0
   4     2        PRE_INC                                          $2      !0
         3        PRE_INC                                          $3      !0
         4        ADD                                              ~4      $2, $3
         5        ECHO                                                     ~4
   5     6      > RETURN                                                   1

branch: #  0; line:     2-    5; sop:     0; eop:     6; out1:  -2
path #1: 0,
like image 435
srain Avatar asked Mar 31 '16 16:03

srain


2 Answers

Disclaimer: I'm not a PHP Internals expert (yet?) so this is all from my understanding, and not guaranteed to be 100% correct or complete. :)

So, firstly, the PHP 7 behaviour - which, I note, is also followed by HHVM - appears to be correct, and PHP 5 has a bug here. There should be no extra assign by reference behaviour here, because regardless of execution order, the result of the two calls to ++$i should never be the same.

The opcodes look fine; crucially, we have two temp variables $2 and $3, to hold the two increment results. But somehow, PHP 5 is acting as though we'd written this:

$i = 2;
$i++; $temp1 =& $i;
$i++; $temp2 =& $i;
echo $temp1 + $temp2; 

Rather than this:

$i = 2;
$i++; $temp1 = $i;
$i++; $temp2 = $i;
echo $temp1 + $temp2; 

Edit: It was pointed out on the PHP Internals mailing list that using multiple operations that modify a variable within a single statement is generally considered "undefined behaviour", and ++ is used as an example of this in C/C++.

As such, it's reasonable for PHP 5 to return the value it does for implementation / optimisation reasons, even if it is logically inconsistent with a sane serialization into multiple statements.

The (relatively new) PHP language specification contains similar language and examples:

Unless stated explicitly in this specification, the order in which the operands in an expression are evaluated relative to each other is unspecified. [...] (For example,[...] in the full expression $j = $i + $i++, whether the value of $i is the old or new $i, is unspecified.)

Arguably, this is a weaker claim than "undefined behaviour", since it implies they are evaluated in some particular order, but we're into nit-picking now.

phpdbg investigation (PHP 5)

I was curious, and want to learn more about the internals, so did some playing around using phpdbg.

No references

Running the code with $j = $i in place of $j =& $i, we start with 2 variables sharing an address, with a refcount of 2 (but no is_ref flag):

Address         Refs    Type            Variable
0x7f3272a83be8  2       (integer)       $i
0x7f3272a83be8  2       (integer)       $j

But as soon as you pre-increment, the zvals are separated, and only one temp var is sharing with $i, giving a refcount of 2:

Address         Refs    Type            Variable
0x7f189f9ecfc8  2       (integer)       $i
0x7f189f859be8  1       (integer)       $j

With reference assignment

When the variables have been bound together, they share an address, with a refcount of 2, and a by-ref marker:

Address         Refs    Type            Variable
0x7f9e04ee7fd0  2       (integer)       &$i
0x7f9e04ee7fd0  2       (integer)       &$j

After the pre-increments (but before the addition), the same address has a refcount of 4, showing the 2 temp vars erroneously bound by reference:

Address         Refs    Type            Variable
0x7f9e04ee7fd0  4       (integer)       &$i
0x7f9e04ee7fd0  4       (integer)       &$j

The source of the issue

Digging into the source on http://lxr.php.net, we can find the implementation of the ZEND_PRE_INC opcode:

  • PHP 5.6
  • PHP 7.0

PHP 5

The crucial line is this:

 SEPARATE_ZVAL_IF_NOT_REF(var_ptr);

So we create a new zval for the result value only if it is not currently a reference. Further down, we have this:

if (RETURN_VALUE_USED(opline)) {
    PZVAL_LOCK(*var_ptr);
    EX_T(opline->result.var).var.ptr = *var_ptr;
}

So if the return value of the decrement is actually used, we need to "lock" the zval, which following a whole series of macros basically means "increment its refcount", before assigning it as the result.

If we created a new zval earlier, that's fine - our refcount is now 2, 1 for the actual variable, plus 1 for the operation result. But if we decided not to, because we needed to hold a reference, we're just incrementing the existing reference count, and pointing at a zval which may be about to be changed again.

PHP 7

So what's different in PHP 7? Several things!

Firstly, the phpdbg output is rather boring, because integers are no longer reference counted in PHP 7; instead, a reference assignment creates an extra pointer, which itself has a refcount of 1, to the same address in memory, which is the actual integer. The phpdbg output looks like this:

Address            Refs    Type      Variable
0x7f175ca660e8     1       integer   &$i
int (2)
0x7f175ca660e8     1       integer   &$j
int (2)

Secondly, there is a special code path in the source for integers:

if (EXPECTED(Z_TYPE_P(var_ptr) == IS_LONG)) {
    fast_long_increment_function(var_ptr);
    if (UNEXPECTED(RETURN_VALUE_USED(opline))) {
        ZVAL_COPY_VALUE(EX_VAR(opline->result.var), var_ptr);
    }
    ZEND_VM_NEXT_OPCODE();
}

So if the variable is an integer (IS_LONG) and not a reference to an integer (IS_REFERENCE) then we can just increment it in place. If we then need the return value, we can copy its value into the result (ZVAL_COPY_VALUE).

If it's a reference, we won't hit that code, but rather than keeping references bound together, we have these two lines:

ZVAL_DEREF(var_ptr);
SEPARATE_ZVAL_NOREF(var_ptr);

The first line says "if it's a reference, follow it to its target"; this takes us from our "reference to an integer" to the integer itself. The second - I think - says "if it's something refcounted, and has more than one reference, create a copy of it"; in our case, this will do nothing, because the integer doesn't care about refcounts.

So now we have an integer we can decrement, that will affect all by-reference associations, but not by-value ones for refcounted types. Finally, if we want the return value of the increment, we again copy it, rather than just assigning it; and this time with a slightly different macro which will increase the refcount of our new zval if necessary:

ZVAL_COPY(EX_VAR(opline->result.var), var_ptr);
like image 181
IMSoP Avatar answered Oct 19 '22 14:10

IMSoP


I'd say the way it works in PHP7 is the right way. It's bad to implicitly change the way operators work depending whether operand is referenced anywhere or not.

This is the best thing about PHP7 being completely rewritten: no clumsy/bug-driven-development v4/v5 code will work.

like image 21
Evgeniy Chekan Avatar answered Oct 19 '22 12:10

Evgeniy Chekan