PHP preg_split by new line with \R

Question

As far as I understand the following line of code should split a string at new lines (, and ).

preg_split("%\R%", $str);

Why is it that

var_dump(preg_split("%\R%", "Å"));

outputs

array(2) {
  [0]=>
  string(1) "▒"
  [1]=>
  string(0) ""
}

but

var_dump(preg_split("%(
|
|
)%", "Å"));

works as expected and does not split the character? I know that I should use the "u" modifier (PCRE_UTF8) because the character is in UTF-8 but why does preg_split think that Å (0xC3 0x85) could contain a new line?

anubhava · Accepted Answer

You have also mentioned that Å is 0xC3 0x85

As per this PCRE documentation without using u modifier \R is equivalent of this atomic group:

(?>
|
|
|\f|\x0b|\x85)

Note presence of \x85 in both sets.

Hence split on \R without using u modifier gives one extra element in output array since it is able to split on \x85 giving you just \xC3 and an empty result in resulting array.

PHP preg_split by new line with \R

Tags:

regex

php

pcre

Steve

1 Answers

anubhava

Recent Activity

Donate For Us

PHP preg_split by new line with \R

Tags:

regex

php

pcre

Steve

1 Answers

anubhava

Related questions

Recent Activity

Donate For Us