PHP have an internal data-structure called smart string (smart_str?), where they store both length and buffer size. That is, more memory than the length of the string is allocated to improve concatenation performance. Why isn't this data-structure used for the actual PHP strings? Wouldn't that lead to fewer memory allocations and better performance?
Normal PHP strings (as of PHP 7) are represented by the zend_string
type, which includes both the length of the string and its character data array. zend_string
s are usually allocated to fit the character data precisely (alignment notwithstanding): They will not leave place to append additional characters.
The smart_str
structure includes a pointer to a zend_string
and an allocation size. This time, the zend_string
will not be precisely allocated. Instead the allocation will be made too large, so that additional characters can be appended without expensive reallocations.
The reallocation policy for smart_str
is as follows: First, it will be allocated to have a total size of 256 bytes (minus the zend_string header, minus allocator overhead). If this size is exceeded it will be reallocated to 4096 bytes (minus overhead). After that, the size will increase in increments of 4096 bytes.
Now, imagine that we replace all strings with smart_str
ings. This would mean that even a single character string would have a minimum allocation size of 256 bytes. Given that most strings in use are small, this is an unacceptable overhead.
So essentially, this is a classic performance/memory tradeoff. We use a memory-compact representation by default and switch to a faster, but less memory-effective representation in the cases that benefit most from it, i.e. cases where large strings are constructed from small parts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With