Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining 4-byte UTF-16 character in a string

I have read a question about UTF-8, UTF-16 and UCS-2 and almost all answers give the statement that UCS-2 is obsolete and C# uses UTF-16.

However, all my attempts to create the 4-byte character U+1D11E in C# failed, so I actually think C# uses the UCS-2 subset of UTF-16 only.

There are my tries:

string s = "\u1D11E"; // gives the 2 character string "ᴑE", because \u1D11 is ᴑ
string s = (char) 0x1D11E; // won't compile because of an overflow
string s = Encoding.Unicode.GetString(new byte[] {0xD8, 0x34, 0xDD, 0x1E}); // gives 㓘ờ

Are C# strings really UTF-16 or are they actually UCS-2? If they are UTF-16, how would I get the violin clef into my C# string?

like image 966
Thomas Weller Avatar asked Jan 01 '14 23:01

Thomas Weller


People also ask

How many bytes is a UTF-16 character?

Likewise, UTF-16 is based on 16-bit code units. Therefore, each character can be 16 bits (2 bytes) or 32 bits (4 bytes).

How many characters can UTF-16 represent?

With supplementary characters, UTF-16 character codes can represent more than one million characters. Without supplementary characters, only 65,536 characters can be represented.

How many characters can be encodes with 16 bits?

UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units.

Is a character 4 bytes?

Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8. These code points are the same as those in ASCII CCSID 367.


1 Answers

Use capital U instead:

  string s = "\U0001D11E";

And you overlooked that most machines are little-endian:

  string t = Encoding.Unicode.GetString(new byte[] { 0x34, 0xD8, 0x1E, 0xDD });
like image 172
Hans Passant Avatar answered Sep 28 '22 13:09

Hans Passant