Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which characters exactly match \s in PHP's PCREs?

Tags:

php

pcre

The manual page remains silent about this, although a user has posted a comment below it, which states that the characters with ASCII codes 0x09, 0x0A, 0x0C, 0x0D, and 0x20, that is TAB, LF, FF, CR and SPACE, are recognized as "whitespace", but no source is given.

If the PCRE is Perl compatible, apparently, this may not be quite as simple, as explained in this Perl documentation. In fact, it might be influenced by the locale, and then it starts to get hairy.

The context is that I'm trying to replace a preg_match call that is meant to check for whitespace-only strings.

like image 613
Hanno Fietz Avatar asked Oct 03 '22 09:10

Hanno Fietz


2 Answers

According to the PHP docs:

The space characters are HT (9), LF (10), VT (11), FF (12), CR (13), and space (32). Notice that this list includes the VT character (code 11). This makes "space" different to \s, which does not include VT (for Perl compatibility).

like image 121
Mark Baker Avatar answered Oct 07 '22 17:10

Mark Baker


From the PCRE documentation:

In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII characters, even in a UTF mode. However, this can be changed by setting the PCRE_UCP option.

According to this StackOverflow answer, the PCRE_UCP option is set along with PCRE_UTF8, when the u modifier is used.

So if you don't use the u modifier, then \s will only ever match the ASCII whitespace characters. If you do, then it will indeed be more complex.

like image 33
lonesomeday Avatar answered Oct 07 '22 19:10

lonesomeday