If I want to discover the hexadecimal equivalent of a space in PHP I can play with bin2hex
:
php > echo var_dump(bin2hex(" "));
string(2) "20"
I can also obtain space character from "20"
php > echo var_dump(hex2bin("20"));
string(1) " "
But there exist Unicode versions of a "visible" space:
php > echo var_dump(hex2bin('c2a0'));
string(2) " "
So, I can get some string (for example from HTTP requests) where I cannot recognize the "no break space" with my eyes. So, ...
$string = preg_replace('~\x{00a0}~siu', ' ', $string);
Is there a better way to find and replace all "space like" characters in PHP?
DOS 255 (decimal) is the no-break space, same as .
Noun. nonspace (countable and uncountable, plural nonspaces) That which is not a social or physical space, or lacks the traditional attributes of spaces. quotations ▼ (computing) A text character that is not a space (or not whitespace).
There are six important white-space characters: the word space, the nonbreaking space, the tab, the hard line break, the carriage return, and the hard page break. Each white-space character has a distinct function.
You can make use of a Unicode category \p{Zs}
:
Zs
Space separator
$string = preg_replace('~\p{Zs}~u', ' ', $string);
The \p{Zs}
Unicode category class will match these space-like symbols:
Character Name
U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With