In PHP, what's the most elegant way to get the complete list (array of strings) of all the Unicode whitespace characters, encoded in utf8? I need that to generate test data.

This email (archived here) contains a list of all Unicode whitespace characters encoded in UTF-8, UTF-16, and HTML. In the archived link look for the 'utf8_whitespace_table' function. <pre class="prettyprint"><code>static $whitespace = array( "SPACE" => "\x20", "NO-BREAK SPACE" => "\xc2\xa0", "OGHAM SPACE MARK" => "\xe1\x9a\x80", "EN QUAD" => "\xe2\x80\x80", "EM QUAD" => "\xe2\x80\x81", "EN SPACE" => "\xe2\x80\x82", "EM SPACE" => "\xe2\x80\x83", "THREE-PER-EM SPACE" => "\xe2\x80\x84", "FOUR-PER-EM SPACE" => "\xe2\x80\x85", "SIX-PER-EM SPACE" => "\xe2\x80\x86", "FIGURE SPACE" => "\xe2\x80\x87", "PUNCTUATION SPACE" => "\xe2\x80\x88", "THIN SPACE" => "\xe2\x80\x89", "HAIR SPACE" => "\xe2\x80\x8a", "ZERO WIDTH SPACE" => "\xe2\x80\x8b", "NARROW NO-BREAK SPACE" => "\xe2\x80\xaf", "MEDIUM MATHEMATICAL SPACE" => "\xe2\x81\x9f", "IDEOGRAPHIC SPACE" => "\xe3\x80\x80", ); </code></pre>

Simplest way to get a complete list of all the UTF-8 whitespace characters in PHP

3 Answers

This email (archived here) contains a list of all Unicode whitespace characters encoded in UTF-8, UTF-16, and HTML.

In the archived link look for the 'utf8_whitespace_table' function.

static $whitespace = array(
    "SPACE" => "\x20",
    "NO-BREAK SPACE" => "\xc2\xa0",
    "OGHAM SPACE MARK" => "\xe1\x9a\x80",
    "EN QUAD" => "\xe2\x80\x80",
    "EM QUAD" => "\xe2\x80\x81",
    "EN SPACE" => "\xe2\x80\x82",
    "EM SPACE" => "\xe2\x80\x83",
    "THREE-PER-EM SPACE" => "\xe2\x80\x84",
    "FOUR-PER-EM SPACE" => "\xe2\x80\x85",
    "SIX-PER-EM SPACE" => "\xe2\x80\x86",
    "FIGURE SPACE" => "\xe2\x80\x87",
    "PUNCTUATION SPACE" => "\xe2\x80\x88",
    "THIN SPACE" => "\xe2\x80\x89",
    "HAIR SPACE" => "\xe2\x80\x8a",
    "ZERO WIDTH SPACE" => "\xe2\x80\x8b",
    "NARROW NO-BREAK SPACE" => "\xe2\x80\xaf",
    "MEDIUM MATHEMATICAL SPACE" => "\xe2\x81\x9f",
    "IDEOGRAPHIC SPACE" => "\xe3\x80\x80",
);

150

answered Oct 16 '22 04:10

devio

Years later, this question still has top results on Google when looking for unicode whitespace characters. devio's answer is great, but incomplete. As of this writing (October 2017) Wikipedia has a list of whitespace characters here: https://en.wikipedia.org/wiki/Whitespace_character

This list has specifies 25 code points, whereas the currently accepted answer lists 18. Including the seven other code points, the list is:

U+0009  character tabulation
U+000A  line feed
U+000B  line tabulation
U+000C  form feed
U+000D  carriage return
U+0020  space
U+0085  next line
U+00A0  no-break space
U+1680  ogham space mark
U+180E  mongolian vowel separator
U+2000  en quad
U+2001  em quad
U+2002  en space
U+2003  em space
U+2004  three-per-em space
U+2005  four-per-em space
U+2006  six-per-em space
U+2007  figure space
U+2008  punctuation space
U+2009  thin space
U+200A  hair space
U+200B  zero width space
U+200C  zero width non-joiner
U+200D  zero width joiner
U+2028  line separator
U+2029  paragraph separator
U+202F  narrow no-break space
U+205F  medium mathematical space
U+2060  word joiner
U+3000  ideographic space
U+FEFF  zero width non-breaking space

answered Oct 16 '22 06:10

cegfault

http://en.wikipedia.org/wiki/Space_%28punctuation%29#Spaces_in_Unicode

Unfortunately, it doesn't give UTF-8, but it does have the character in the web page, so you could cut and paste into your editor (if it saves in UTF-8). Alternatively, http://www.fileformat.info/info/unicode/char/180E/index.htm gives UTF-8 (replace "180E" with the hex UTF-16 value you are looking up).

This also gives a couple extra characters that @devio's excellent answer misses.

answered Oct 16 '22 05:10

prewett

Related questions
                            
                                Laravel Eloquent ORM replicate
                            
                                Understanding pdo mysql transactions
                            
                                PHP Startup: Unable to load dynamic library `curl.so` Ubuntu
                            
                                Is using PHP accelerators such as MMCache or Zend Accelerator making PHP faster?
                            
                                How to compare two arrays and remove matching elements from one for the next loop?
                            
                                Can I set up a default method argument with class property in PHP?
                            
                                Can I make a PHP "macro" (like #define) to supply parameters for function calls?
                            
                                MYSQLi error: User already has more than 'max_user_connections' active connections [duplicate]
                            
                                PHP JSON or Array to XML
                            
                                Install PHP 5.4.1 in centos6.2
                            
                                How to make a variable private to a trait?
                            
                                Is there a shortcut for inserting PHP's object operator upon code completion in PhpStorm?
                            
                                Overwrite laravel 5 helper function
                            
                                Quickest PHP equivalent of javascript `var a = var1||var2||var3;` expression
                            
                                $_POST not retrieving data from Javascript's Fetch()
                            
                                Set expiry time for laravel jwt dynamically
                            
                                Laravel vagrant up not working - Errno::EADDRNOTAVAIL
                            
                                How to use default value when `null` is given for a nullable function parameter?
                            
                                Private access in inheritance
                            
                                Simple User management example for Google App Engine?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Simplest way to get a complete list of all the UTF-8 whitespace characters in PHP

Tags:

php

whitespace

utf-8

space

Ivan Krechetov

People also ask

3 Answers

devio

cegfault

prewett

Recent Activity

Donate For Us