Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP arrays memory consumptions

Tags:

arrays

php

Please, explain me how it works. Why passing value to array from variable instead of literal increasing memory consumption in 10x times?

PHP 7.1.17

First example:

<?php
ini_set('memory_limit', '1G');
$array = [];
$row = 0;
while ($row < 2000000) {
    $array[] = [1];

    if ($row % 100000 === 0) {
        echo (memory_get_usage(true) / 1000000) . PHP_EOL;
    }
    $row++;
}

Total memory usage ~70MB

Second example:

<?php
ini_set('memory_limit', '1G');
$array = [];
$a = 1;

$row = 0;
while ($row < 2000000) {
    $array[] = [$a];

    if ($row % 100000 === 0) {
        echo (memory_get_usage(true) / 1000000) . PHP_EOL;
    }
    $row++;
}

Total memory usage ~785MB

Also there is no difference in memory consumption if resulting array is one-dimensional.

like image 530
kindratmakc Avatar asked Oct 26 '25 18:10

kindratmakc


1 Answers

The key thing here is that [1], although it's a complex value, is a constant - the compiler can trivially know that it's the same every time it's used.

Since PHP uses a "copy on write" system when multiple variables have the same value, the compiler can actually construct the "zval" structure for the array before the code is run, and just increment its reference counter each time a new variable or array value points to it. (If any of them are modified later, they will be "separated" into a new zval before modification, so at that point an extra copy will be made anyway.)

So (using 42 to stand out more), this:

$bar = [];
$bar[] = [42];

Compiles to this (VLD output generated with https://3v4l.org):

compiled vars:  !0 = $bar
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E >   ASSIGN                                                   !0, <array>
   4     1        ASSIGN_DIM                                               !0
         2        OP_DATA                                                  <array>
         3      > RETURN                                                   1

Note that the 42 doesn't even show up in the VLD output, it's implicit in the second <array>. So the only memory usage is for the outer array to store a long list of pointers, which all happen to point to the same zval.

When using a variable like [$a], on the other hand, there is no guarantee that the values will all be the same. It's possible to analyse the code and deduce that they will be, so OpCache might apply some optimisations, but on its own:

$a = 42;
$foo = [];
$foo[] = [$a];

Compiles to:

compiled vars:  !0 = $a, !1 = $foo
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E >   ASSIGN                                                   !0, 42
   4     1        ASSIGN                                                   !1, <array>
   5     2        INIT_ARRAY                                       ~5      !0
         3        ASSIGN_DIM                                               !1
         4        OP_DATA                                                  ~5
         5      > RETURN                                                   1

Note the extra INIT_ARRAY opcode - that's a new zval being created with the value of [$a]. This is where all your extra memory goes - every iteration will create a new array that happens to have the same contents.


It's relevant to point out here that if $a was itself a complex value - an array or object - it would not be copied on each iteration, as it would have its own reference counter. You'd still be creating a new array each time around the loop, but those arrays would all contain a copy-on-write pointer to $a, not a copy of it. This doesn't happen for integers (in PHP 7) because its actually cheaper to store the integer directly than to store a pointer to somewhere else that stores the integer.

One more variation worth looking at, because it may be an optimisation you can make by hand:

$a = 42;
$b = [$a];
$foo = [];
$foo[] = $a;

VLD output:

compiled vars:  !0 = $a, !1 = $b, !2 = $foo
line     #* E I O op                           fetch          ext  return  operands
-------------------------------------------------------------------------------------
   3     0  E >   ASSIGN                                                   !0, 42
   4     1        INIT_ARRAY                                       ~4      !0
         2        ASSIGN                                                   !1, ~4
   5     3        ASSIGN                                                   !2, <array>
   6     4        ASSIGN_DIM                                               !2
         5        OP_DATA                                                  !0
   7     6      > RETURN                                                   1

Here, we have an INIT_ARRAY opcode when we create $b, but not when we add it to $foo. The ASSIGN_DIM will see that it's safe to reuse the $b zval each time, and increment its reference counter. I haven't tested, but I believe this will take you back to the same memory usage as the constant [1] case.


A final way to verify that copy-on-write is in use here is to use debug_zval_dump, which shows the reference count of a value. The exact numbers are always a bit off, because passing the variable to the function itself creates one or more references, but you can get a good idea from the relative values:

Constant array:

$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = [42];
}
debug_zval_dump($foo[0]);

Shows refcount of 102, as value is shared across 100 copies.

Identical but not constant array:

$a = 42;
$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = [$a];
}
debug_zval_dump($foo[0]);

Shows refcount of 2, as each value has its own zval.

Array constructed once and reused explicitly:

$a = 42;
$b = [$a];
$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = $b;
}
debug_zval_dump($foo[0]);

Shows refcount of 102, as value is shared across 100 copies.

Complex value inside (also try $a = new stdClass etc):

$a = [1,2,3,4,5];
$foo = [];
for($i=0; $i<100; $i++) {
    $foo[] = [$a];
}
debug_zval_dump($foo[0]);

Shows refcount of 2, but the inner array has a refcount of 102: there's a separate array for every outer item, but they all contain pointers to the zval created as $a.

like image 128
IMSoP Avatar answered Oct 29 '25 08:10

IMSoP