As far as I understand the following line of code should split a string at new lines (\r, \n and \r\n).
preg_split("%\R%", $str);
Why is it that
var_dump(preg_split("%\R%", "Å"));
outputs
array(2) {
[0]=>
string(1) "▒"
[1]=>
string(0) ""
}
but
var_dump(preg_split("%(\r|\n|\r\n)%", "Å"));
works as expected and does not split the character? I know that I should use the "u" modifier (PCRE_UTF8) because the character is in UTF-8 but why does preg_split think that Å (0xC3 0x85) could contain a new line?
You have also mentioned that Å is 0xC3 0x85
As per this PCRE documentation without using u modifier \R is equivalent of this atomic group:
(?>\r\n|\n|\r|\f|\x0b|\x85)
Note presence of \x85 in both sets.
Hence split on \R without using u modifier gives one extra element in output array since it is able to split on \x85 giving you just \xC3 and an empty result in resulting array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With