The manual page remains silent about this, although a user has posted a comment below it, which states that the characters with ASCII codes 0x09
, 0x0A
, 0x0C
, 0x0D
, and 0x20
, that is TAB, LF, FF, CR and SPACE, are recognized as "whitespace", but no source is given.
If the PCRE is Perl compatible, apparently, this may not be quite as simple, as explained in this Perl documentation. In fact, it might be influenced by the locale, and then it starts to get hairy.
The context is that I'm trying to replace a preg_match call that is meant to check for whitespace-only strings.
According to the PHP docs:
The space characters are HT (9), LF (10), VT (11), FF (12), CR (13), and space (32). Notice that this list includes the VT character (code 11). This makes "space" different to \s, which does not include VT (for Perl compatibility).
From the PCRE documentation:
In PCRE, by default,
\d
,\D
,\s
,\S
,\w
, and\W
recognize only ASCII characters, even in a UTF mode. However, this can be changed by setting thePCRE_UCP
option.
According to this StackOverflow answer, the PCRE_UCP
option is set along with PCRE_UTF8
, when the u
modifier is used.
So if you don't use the u
modifier, then \s
will only ever match the ASCII whitespace characters. If you do, then it will indeed be more complex.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With