Which characters exactly match \s in PHP's PCREs?

Question

The manual page remains silent about this, although a user has posted a comment below it, which states that the characters with ASCII codes 0x09, 0x0A, 0x0C, 0x0D, and 0x20, that is TAB, LF, FF, CR and SPACE, are recognized as "whitespace", but no source is given.

If the PCRE is Perl compatible, apparently, this may not be quite as simple, as explained in this Perl documentation. In fact, it might be influenced by the locale, and then it starts to get hairy.

The context is that I'm trying to replace a preg_match call that is meant to check for whitespace-only strings.

Mark Baker · Accepted Answer

According to the PHP docs:

The space characters are HT (9), LF (10), VT (11), FF (12), CR (13), and space (32). Notice that this list includes the VT character (code 11). This makes "space" different to \s, which does not include VT (for Perl compatibility).

lonesomeday · Answer

From the PCRE documentation:

In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII characters, even in a UTF mode. However, this can be changed by setting the PCRE_UCP option.

According to this StackOverflow answer, the PCRE_UCP option is set along with PCRE_UTF8, when the u modifier is used.

So if you don't use the u modifier, then \s will only ever match the ASCII whitespace characters. If you do, then it will indeed be more complex.

Which characters exactly match \s in PHP's PCREs?

Tags:

php

pcre

Hanno Fietz

2 Answers

Mark Baker

lonesomeday

Recent Activity

Donate For Us

Which characters exactly match \s in PHP's PCREs?

Tags:

php

pcre

Hanno Fietz

2 Answers

Mark Baker

lonesomeday

Related questions

Recent Activity

Donate For Us