Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP SimpleXML large file no extra memory usage

In every article about SimpleXML performance and memory usage it is mentioned that all parsed content is stored in memory and that processing large files will lead to large memory usage. But recently I found that processing large files with SimpleXML do not cause large memory usage even more it causes almost none memory usage. There is my test script:

<?php
error_reporting(E_ALL);
ini_set("display_errors", 1);
print "OS: " . php_uname() . "\n";
print "PHP version: " . phpversion() . "\n";

print round(memory_get_usage() / 1024 / 1024, 2) . " Mb\n";
$large_xml = '<?xml version="1.0" encoding="UTF-8"?><catalog><products>';
for ($i = 0; $i < 500000; $i++) {
    $large_xml .= "<product><id>{$i}</id><name>Product Name {$i}</name><description>Some Description {$i}</description><price>{$i}</price></product>\n";
}
$large_xml .= "</products></catalog>";
print round(memory_get_usage() / 1024 / 1024, 2) . " Mb\n";
$products_sxml = simplexml_load_string($large_xml);
print round(memory_get_usage() / 1024 / 1024, 2) . " Mb\n";
?>

I was tesing this script on Linux server, PHP version: 5.3.8 and the output was:

OS: Linux 2.6.32-5-amd64 #1 SMP Mon Feb 25 00:26:11 UTC 2013 x86_64

PHP version: 5.3.8

0.6 Mb

65.98 Mb

65.98 Mb

So my question is - does anyone else has noticed it and what could be an explanation to this it, because I could not find anywhere in the web the explanaition of it - not even an confirmation about it?

like image 581
Aigars Avatar asked Mar 10 '26 16:03

Aigars


1 Answers

The memory management functionality of PHP is quite sophisticated, and accurately measuring the impact of a particular piece of high-level code is quite difficult. There was quite a good (very technical) talk on this by Julien Pauli at the PHP UK Conference, a video of which is available here.

There are a few possible reasons why memory_get_usage might be lying to you:

  • Firstly, memory_get_usage takes an optional parameter of $real_usage, which distinguishes between the amount of memory allocated and the amount in use - the memory manager allocates memory a block at a time, so it will often have claimed more from the OS than is actually in use. As more is needed, the already-claimed memory is used up, meaning no more needs to be allocated. Testing in this case suggests that this is not relevant here.
  • More generally, there are different ways of allocating memory in the underlying C code that runs PHP. Since most of the work of SimpleXML is done not in the Zend Engine, but in a third-party library called libxml2, the memory allocation will be done there, not in the PHP-specific allocation routines which would be used when, say, appending to a PHP string.

I took the following function from Julien Pauli's slides, which looks at the Linux kernel's view of the running PHP process and finds the line which represents the "Resident Set Size" - the amount of physical memory which has actually been allocated, rather than the amount the process has asked to be reserved:

function heap() {
    return shell_exec(sprintf('grep "VmRSS:" /proc/%s/status', getmypid()));
}

Adding a call to this (as well as to get_memory_usage(true)) in your sample code, I got the following output, showing a significant allocation of "heap" memory when you parse the XML:

OS: Linux pink-marmalade 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64
PHP version: 5.3.10-1ubuntu3.8
memory_get_usage(): 0.61 Mb
memory_get_usage(true): 0.75 Mb
Heap: VmRSS:        6956 kB

memory_get_usage(): 65.99 Mb
memory_get_usage(true): 66.25 Mb
Heap: VmRSS:       74348 kB

memory_get_usage(): 65.99 Mb
memory_get_usage(true): 66.25 Mb
Heap: VmRSS:      761836 kB
like image 191
IMSoP Avatar answered Mar 13 '26 12:03

IMSoP



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!