Are all Kanji characters in UTF-8 3 bytes long?

2 Answers

Yes, Kanji is U+4e00 to U+9faf, UTF8 3 bytes are U+0800 to U+FFFF.

answered Sep 19 '22 07:09

gawi

The commonly used Hanzi/Kanji characters are in the "CJK Unified Ideographs" block between U+4E00 and U+9FFF, and take 3 bytes in UTF-8. (The Japanese Hiragana and Katakana characters also take 3 bytes.)

However, there are also some very rarely-used characters in the "CJK Unified Ideographs Extension B" and "CJK Compatibility Ideographs Supplement" blocks, which take 4 bytes in UTF-8.

Also be aware that Chinese text often contains ASCII characters like the digits 0-9.

answered Sep 21 '22 07:09

dan04

Related questions
                            
                                What is meant by 'bucket-size' of queue in the google app engine?
                            
                                Is it allowed to cache static google maps?
                            
                                How to reference an embedded document in Mongoid?
                            
                                WCF HttpTransport: streamed vs buffered TransferMode
                            
                                Is there any difference between 1U and 1 in C?
                            
                                Can I force a remote hg repo to do hg update after I push to it?
                            
                                Javascript object members that are prototyped as arrays become shared by all class instances
                            
                                Differences between xp_instance_RegRead and xp_RegRead
                            
                                Guarantee code execution even on process kill
                            
                                Python and Unicode: How everything should be Unicode
                            
                                Why is the static keyword used in UITableViewCell identifiers?
                            
                                Addition of two chars produces int

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are all Kanji characters in UTF-8 3 bytes long?

Tags:

TopCoder

People also ask

2 Answers

gawi

dan04

Recent Activity

Donate For Us