I know that there are many types of space (em space, en space, thin space, non-breaking space, etc), but, all these, that I refered, have HTML entities (at least, PHP's htmlentities() return something like  .
But, what about those spaces that have no HTML entities?
Example: [example URL not valid anymore]
Look at the nickname of this account. It has many " " (spaces) at the front, which are visible for us (this doesn't happen with the ).
I tried already filter with regular expressions, using \x escape, filter with str_replace(), with the space as the argument, and no luck at all!
Do you have any suggestion on how to filter ALL types of whitespace?
\s
by default, will not match whitespace characters with values greater than 128. To get at those, you can instead make good use of other UTF-8-aware sequences.
(Standard disclaimer: I'm skimming the PCRE source code to compile the lists below, I may miss a character or type something incorrectly. Please forgive me.)
\p{Zs}
matches:
\h
(Horizontal whitespace) matches the same as \p{Zs}
above, plus
Similarly for matching vertical whitespace there are a few options.
\p{Zl}
matches U+2028 Line separator.
\p{Zp}
matches U+2029 Paragraph separator.
\v
(Vertical whitespace) matches \p{Zl}
, \p{Zp}
and the following
Going back to the beginning, in UTF-8 mode (i.e. using the u
pattern modifier) \s
will match any character that \p{Z}
matches (which is anything that \p{Zs}
, \p{Zl}
and \p{Zp}
will match), plus
To cut a long story short (I bet you read all of the above, didn't you?) you might want to use \s
but make sure to be in UTF-8 mode like /\s/u
. Putting that to some practical use, to filter out those matching whitespace characters from a string you would do something like
$new_string = preg_replace('/\s/u', '', $old_string);
Finally, if you really, really care about the vertical whitespaces which aren't included in \s
(LF and NEL) then you can use the character class [\s\v]
to match all 26 of the whitespace characters listed above.
They are all plain spaces (returning character code 32) that can be caught with regular expressions or trim()
.
Try this:
preg_replace("/\s{2,}/", " ", $text);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With