I have read a question about UTF-8, UTF-16 and UCS-2 and almost all answers give the statement that UCS-2 is obsolete and C# uses UTF-16.
However, all my attempts to create the 4-byte character U+1D11E in C# failed, so I actually think C# uses the UCS-2 subset of UTF-16 only.
There are my tries:
string s = "\u1D11E"; // gives the 2 character string "ᴑE", because \u1D11 is ᴑ
string s = (char) 0x1D11E; // won't compile because of an overflow
string s = Encoding.Unicode.GetString(new byte[] {0xD8, 0x34, 0xDD, 0x1E}); // gives 㓘ờ
Are C# strings really UTF-16 or are they actually UCS-2? If they are UTF-16, how would I get the violin clef into my C# string?
Likewise, UTF-16 is based on 16-bit code units. Therefore, each character can be 16 bits (2 bytes) or 32 bits (4 bytes).
With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented.
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units.
Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8. These code points are the same as those in ASCII CCSID 367.
Use capital U instead:
string s = "\U0001D11E";
And you overlooked that most machines are little-endian:
string t = Encoding.Unicode.GetString(new byte[] { 0x34, 0xD8, 0x1E, 0xDD });
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With