Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the complete range for Chinese characters in Unicode?

Tags:

unicode

cjk

U+4E00..U+9FFF is part of the complete set, but not all

like image 877
omg Avatar asked Sep 02 '09 06:09

omg


People also ask

What is the Unicode range for Chinese characters?

The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system and hanja, whose use is diminishing in Korea.

Are Chinese characters UTF-8 or UTF-16?

UTF-8 is a character encoding system. It lets you represent characters as ASCII text, while still allowing for international characters, such as Chinese characters. As of the mid 2020s, UTF-8 is one of the most popular encoding systems.

Are Chinese characters UTF-8?

Unicode/UTF-8 characters include: Chinese characters. any non-Latin scripts (Hebrew, Cyrillic, Japanese, etc.) symbols.

Can Unicode represent Chinese?

Unicode is widely regarded as politically neutral, has good support for both simplified and traditional characters, and can be easily converted to and from the GB and Big5. Furthermore, Unicode has the advantage of not being limited only to Chinese, since it can also display many other character sets.


1 Answers

May be you would find a complete list through the CJK Unicode FAQ (which does include "Chinese, Japanese, and Korean" characters)

The "East Asian Script" document does mention:

Blocks Containing Han Ideographs

Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 12-2

Table 12-2. Blocks Containing Han Ideographs

Block                                   Range       Comment CJK Unified Ideographs                  4E00-9FFF   Common CJK Unified Ideographs Extension A      3400-4DBF   Rare CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants 

Note: the block ranges can evolve over time: latest is in CJK Unified Ideographs.

See also Wikipedia:

  • CJK Unified Ideographs Extension A
  • CJK Unified Ideographs Extension B
  • CJK Unified Ideographs Extension C
  • CJK Unified Ideographs Extension D
  • CJK Unified Ideographs Extension E
  • CJK Unified Ideographs Extension F (Unicode 10)
like image 80
VonC Avatar answered Oct 19 '22 03:10

VonC