Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't PHP use internal smart string for strings?

PHP have an internal data-structure called smart string (smart_str?), where they store both length and buffer size. That is, more memory than the length of the string is allocated to improve concatenation performance. Why isn't this data-structure used for the actual PHP strings? Wouldn't that lead to fewer memory allocations and better performance?

like image 398
Olle Härstedt Avatar asked Sep 17 '15 21:09

Olle Härstedt


1 Answers

Normal PHP strings (as of PHP 7) are represented by the zend_string type, which includes both the length of the string and its character data array. zend_strings are usually allocated to fit the character data precisely (alignment notwithstanding): They will not leave place to append additional characters.

The smart_str structure includes a pointer to a zend_string and an allocation size. This time, the zend_string will not be precisely allocated. Instead the allocation will be made too large, so that additional characters can be appended without expensive reallocations.

The reallocation policy for smart_str is as follows: First, it will be allocated to have a total size of 256 bytes (minus the zend_string header, minus allocator overhead). If this size is exceeded it will be reallocated to 4096 bytes (minus overhead). After that, the size will increase in increments of 4096 bytes.

Now, imagine that we replace all strings with smart_strings. This would mean that even a single character string would have a minimum allocation size of 256 bytes. Given that most strings in use are small, this is an unacceptable overhead.

So essentially, this is a classic performance/memory tradeoff. We use a memory-compact representation by default and switch to a faster, but less memory-effective representation in the cases that benefit most from it, i.e. cases where large strings are constructed from small parts.

like image 144
NikiC Avatar answered Nov 12 '22 19:11

NikiC