Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP preg_split by new line with \R

Tags:

regex

php

pcre

As far as I understand the following line of code should split a string at new lines (\r, \n and \r\n).

preg_split("%\R%", $str);

Why is it that

var_dump(preg_split("%\R%", "Å"));

outputs

array(2) {
  [0]=>
  string(1) "▒"
  [1]=>
  string(0) ""
}

but

var_dump(preg_split("%(\r|\n|\r\n)%", "Å"));

works as expected and does not split the character? I know that I should use the "u" modifier (PCRE_UTF8) because the character is in UTF-8 but why does preg_split think that Å (0xC3 0x85) could contain a new line?

like image 647
Steve Avatar asked Nov 27 '25 23:11

Steve


1 Answers

You have also mentioned that Å is 0xC3 0x85

As per this PCRE documentation without using u modifier \R is equivalent of this atomic group:

(?>\r\n|\n|\r|\f|\x0b|\x85)

Note presence of \x85 in both sets.

Hence split on \R without using u modifier gives one extra element in output array since it is able to split on \x85 giving you just \xC3 and an empty result in resulting array.

like image 62
anubhava Avatar answered Nov 29 '25 12:11

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!