With some Unicode codes has more than one byte, can visual studios handle these characters? How?
http://www.unicode.org release below for CJK. Now one character could be more than one byte.
Below statement failed for me on Visual Studio 2012:
char ch = '\u2A6D6';
I have not tried on visual Studio 2013 / Visual Studio 2015 yet.
This code-point doesn't fit into a char since char only has 16 bits and thus only supports code-points up to 65535. Characters outside the basic multilingual plane (BMP) can be encoded as two UTF-16 code-units in a string using surrogate pairs.
char.ConvertFromUtf32(0x2A6D6) returns a string with two chars, "\uD869\uDED6"
Code points U+10000 to U+10FFFF
Code points from the other planes (called Supplementary Planes) are encoded in UTF-16 by pairs of 16-bit code units called surrogate pairs, by the following scheme:
- 0x010000 is subtracted from the code point, leaving a 20 bit number in the range 0..0x0FFFFF.
- The top ten bits (a number in the range 0..0x03FF) are added to 0xD800 to give the first code unit or lead surrogate, which will be in the range 0xD800..0xDBFF. (Previous versions of the Unicode Standard referred to these as high surrogates.)
- The low ten bits (also in the range 0..0x03FF) are added to 0xDC00 to give the second code unit or trail surrogate, which will be in the range 0xDC00..0xDFFF. (Previous versions of the Unicode Standard referred to these as low surrogates.)
from wikipedia - UTF-16
Visual Studio should be able to handle them fine. Your code, however, is not legal in C#. As mentioned by @CodesInChaos, chars in .NET are UTF-16 code units, not Unicode code points. The \uxxxx escape sequence only allows 4 hex digits (2 bytes). In C#, you would generally use the \Uxxxxxxxx escape for code points above 0xFFFF, but do note that this escape sequence is translated into two surrogate UTF-16 code units (i.e. two .NET chars) so they can't be assigned to the char data type. If you need to use char, you would have to use the surrogates as suggested by @CodesInChaos, but otherwise you would generally do the following:
string s = "\U0002A6D6";
Side note: I wouldn't call the expansion past 2 bytes recent, it happened almost 20 years ago.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With