size of char is : 2 (msdn)
sizeof(char) //2
a test :
char[] c = new char[1] {'a'};
Encoding.UTF8.GetByteCount(c) //1 ?
why the value is 1?
(of course if c is a unicode char like 'ש' so it does show 2 as it should.)
a
is not .net char ?
It's because 'a' only takes one byte to encode in UTF-8.
Encoding.UTF8.GetByteCount(c)
will tell you how many bytes it takes to encode the given array of characters in UTF-8. See the documentation for Encoding.GetByteCount
for more details. That's entirely separate from how wide the char
type is internally in .NET.
Each character with code points less than 128 (i.e. U+0000 to U+007F) takes a single byte to encode in UTF-8.
Other characters take 2, 3 or even 4 bytes in UTF-8. (There are values over U+1FFFF which would take 5 or 6 bytes to encode, but they're not part of Unicode at the moment, and probably never will be.)
Note that the only characters which take 4 bytes to encode in UTF-8 can't be encoded in a single char
anyway. A char
is a UTF-16 code unit, and any Unicode code points over U+FFFF require two UTF-16 code units forming a surrogate pair to represent them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With