I'm writing a simple website parser on PHP 5.2.10.
When using default internal encoding (which is ISO-8859-1), I get an error always at the same function call:
$start = mb_strpos($index, '<a name=gr1>');
Fatal error: Allowed memory size of 50331648 bytes exhausted (tried to allocate 11924760 bytes)
The length of the string $index in this case was 2981190 bytes - exactly 4 times less than PHP tried to allocate.
Now, if I use
mb_internal_encoding('UTF-8')
the error disappears. Does that mean that PHP uses more memory for single-byte strings that for multibyte ones? How's that possible? Any ideas?
UPD: Memory usage doesn't seem to depend on encoding: average memory_get_usage() is almost the same using UTF-8 and ISO-8859-1. I think that the problem might be in mb_strpos. In fact, the string $index has Windows-1251 encoding (cyrillic), so it contains symbols that are not valid for UTF-8. This may cause mb_strpos to somehow try to convert or just use the additional memory for some needs. Will try to find the answer in the sources of mb_strpos.
Sorry if you've already thought of these potential issues.
The multibyte string functions will check UTF-8 encodings for errors and, if there are invalid characters, returns an empty string or false (as in the case of mb_strpos(): http://www.serverphorums.com/read.php?7,552099
Are you checking the result you're getting using the ===
operator to ensure that you're not receiving false
instead of 0
?
The mb_strpos()
function uses mbfl_strpos()
, which makes copies of the strings (needle, haystack) when it has to perform conversions (leading to increases in memory, as you observed):
https://github.com/php/php-src/blob/master/ext/mbstring/libmbfl/mbfl/mbfilter.c#L811
So, I'm wondering if using the default internal encoding (ISO-8859-1) let everything through, and the memory limit was hit, whereas the utf-8 encoding short circuited due to the illegal characters and returned false (which, if you were testing with ==
, would make it appear that the function merely didn't find a match.)
Worth a shot :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With