Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String concatenation while incrementing

This is my code:

$a = 5;
$b = &$a;
echo ++$a.$b++;

Shouldn't it print 66?

Why does it print 76?

like image 706
user2500553 Avatar asked Jun 19 '13 09:06

user2500553


People also ask

Can you use += for string concatenation?

Concatenation is the process of combining two or more strings to form a new string by subsequently appending the next string to the end of the previous strings. In Java, two strings can be concatenated by using the + or += operator, or through the concat() method, defined in the java. lang. String class.

What are the 2 methods used for string concatenation?

There are two ways to concatenate strings in Java: By + (String concatenation) operator. By concat() method.

What is the most efficient way to concatenate many strings together?

Concatenate Many Strings using the Join As shown above, using the join() method is more efficient when there are many strings. It takes less time for execution.

Is concatenation faster than join?

Doing N concatenations requires creating N new strings in the process. join() , on the other hand, only has to create a single string (the final result) and thus works much faster.


1 Answers

Alright. This is actually pretty straight forward behavior, and it has to do with how references work in PHP. It is not a bug, but unexpected behavior.

PHP internally uses copy-on-write. Which means that the internal variables are copied when you write to them (so $a = $b; doesn't copy memory until you actually change one of them). With references, it never actually copies. That's important for later.

Let's look at those opcodes:

line     # *  op                           fetch          ext  return  operands
---------------------------------------------------------------------------------
   2     0  >   ASSIGN                                                   !0, 5
   3     1      ASSIGN_REF                                               !1, !0
   4     2      PRE_INC                                          $2      !0
         3      POST_INC                                         ~3      !1
         4      CONCAT                                           ~4      $2, ~3
         5      ECHO                                                     ~4
         6    > RETURN                                                   1

The first two should be pretty easy to understand.

  • ASSIGN - Basically, we're assinging the value of 5 into the compiled variable named !0.
  • ASSIGN_REF - We're creating a reference from !0 to !1 (the direction doesn't matter)

So far, that's straight forward. Now comes the interesting bit:

  • PRE_INC - This is the opcode that actually increments the variable. Of note is that it returns its result into a temporary variable named $2.

So let's look at the source code behind PRE_INC when called with a variable:

static int ZEND_FASTCALL  ZEND_PRE_INC_SPEC_VAR_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    USE_OPLINE
    zend_free_op free_op1;
    zval **var_ptr;

    SAVE_OPLINE();
    var_ptr = _get_zval_ptr_ptr_var(opline->op1.var, execute_data, &free_op1 TSRMLS_CC);

    if (IS_VAR == IS_VAR && UNEXPECTED(var_ptr == NULL)) {
        zend_error_noreturn(E_ERROR, "Cannot increment/decrement overloaded objects nor string offsets");
    }
    if (IS_VAR == IS_VAR && UNEXPECTED(*var_ptr == &EG(error_zval))) {
        if (RETURN_VALUE_USED(opline)) {
            PZVAL_LOCK(&EG(uninitialized_zval));
            AI_SET_PTR(&EX_T(opline->result.var), &EG(uninitialized_zval));
        }
        if (free_op1.var) {zval_ptr_dtor(&free_op1.var);};
        CHECK_EXCEPTION();
        ZEND_VM_NEXT_OPCODE();
    }

    SEPARATE_ZVAL_IF_NOT_REF(var_ptr);

    if (UNEXPECTED(Z_TYPE_PP(var_ptr) == IS_OBJECT)
       && Z_OBJ_HANDLER_PP(var_ptr, get)
       && Z_OBJ_HANDLER_PP(var_ptr, set)) {
        /* proxy object */
        zval *val = Z_OBJ_HANDLER_PP(var_ptr, get)(*var_ptr TSRMLS_CC);
        Z_ADDREF_P(val);
        fast_increment_function(val);
        Z_OBJ_HANDLER_PP(var_ptr, set)(var_ptr, val TSRMLS_CC);
        zval_ptr_dtor(&val);
    } else {
        fast_increment_function(*var_ptr);
    }

    if (RETURN_VALUE_USED(opline)) {
        PZVAL_LOCK(*var_ptr);
        AI_SET_PTR(&EX_T(opline->result.var), *var_ptr);
    }

    if (free_op1.var) {zval_ptr_dtor(&free_op1.var);};
    CHECK_EXCEPTION();
    ZEND_VM_NEXT_OPCODE();
}

Now I don't expect you to understand what that's doing right away (this is deep engine voodoo), but let's walk through it.

The first two if statements check to see if the variable is "safe" to increment (the first checks to see if it's an overloaded object, the second checks to see if the variable is the special error variable $php_error).

Next is the really interesting bit for us. Since we're modifying the value, it needs to preform copy-on-write. So it calls:

SEPARATE_ZVAL_IF_NOT_REF(var_ptr);

Now, remember, we already set the variable to be a reference above. So the variable is not separated... Which means everything we do to it here will happen to $b as well...

Next, the variable is incremented (fast_increment_function()).

Finally, it sets the result as itself. This is copy-on-write again. It's not returning the value of the operation, but the actual variable. So what PRE_INC returns is still a reference to $a and $b.

  • POST_INC - This behaves similarly to PRE_INC, except for one VERY important fact.

Let's check out the source code again:

static int ZEND_FASTCALL  ZEND_POST_INC_SPEC_VAR_HANDLER(ZEND_OPCODE_HANDLER_ARGS)
{
    retval = &EX_T(opline->result.var).tmp_var;
    ZVAL_COPY_VALUE(retval, *var_ptr);
    zendi_zval_copy_ctor(*retval);

    SEPARATE_ZVAL_IF_NOT_REF(var_ptr);
    fast_increment_function(*var_ptr);
}

This time I cut away all of the non-interesting stuff. So let's look at what it's doing.

First, it gets the return temporary variable (~3 in our code above).

Then it copies the value from its argument (!1 or $b) into the result (and hence the reference is broken).

Then it increments the argument.

Now remember, the argument !1 is the variable $b, which has a reference to !0 ($a) and $2, which if you remember was the result from PRE_INC.

So there you have it. It returns 76 because the reference is maintained in PRE_INC's result.

We can prove this by forcing a copy, by assigning the pre-inc to a temporary variable first (through normal assignment, which will break the reference):

$a = 5;
$b = &$a;
$c = ++$a;
$d = $b++;
echo $c.$d;

Which works as you expected. Proof

And we can reproduce the other behavior (your bug) by introducing a function to maintain the reference:

function &pre_inc(&$a) {
    return ++$a;
}

$a = 5;
$b = &$a;
$c = &pre_inc($a);
$d = $b++;
echo $c.$d;

Which works as you're seeing it (76): Proof

Note: the only reason for the separate function here is that PHP's parser doesn't like $c = &++$a;. So we need to add a level of indirection through the function call to do it...

The reason I don't consider this a bug is that it's how references are supposed to work. Pre-incrementing a referenced variable will return that variable. Even a non-referenced variable should return that variable. It may not be what you expect here, but it works quite well in almost every other case...

The Underlying Point

If you're using references, you're doing it wrong about 99% of the time. So don't use references unless you absolutely need them. PHP is a lot smarter than you may think at memory optimizations. And your use of references really hinders how it can work. So while you think you may be writing smart code, you're really going to be writing less efficient and less friendly code the vast majority of the time...

And if you want to know more about References and how variables work in PHP, checkout One Of My YouTube Videos on the subject...

like image 87
ircmaxell Avatar answered Oct 19 '22 00:10

ircmaxell