Is it possible in C# to use UTF-32 characters not in Plane 0 as a char?
string s = ""; // valid
char c = ''; // generates a compiler error ("Too many characters in character literal")
And in s it is represented by two characters, not one.
Edit: I mean, is there a character AN string type with full unicode support, UTF-32 or UTF-8 per character? For example if I want a for loop on utf-32 (maybe not in plane0) characters in a string.
The string
class represents a UTF-16 encoded block of text, and each char
in a string
represents a UTF-16 code value.
Although there is no BCL type that represents a single Unicode code point, there is support for Unicode characters beyond Plane 0 in the form of method overloads taking a string
and an index instead of just a char
. For example, the static GetUnicodeCategory
(char)
method on the System.Globalization.CharUnicodeInfo class has a corresponding GetUnicodeCategory
(string,int)
method that will recognize a simple character or a surrogate pair starting at the specified index.
To iterate through the text elements in a string
, you can use the methods on the System.Globalization.StringInfo class. Here, a "text element" corresponds to a single character as displayed on screen. This means that simple characters ("a"
), combining characters ("a\u0304\u0308"
= "ā̈"), and surrogate pairs ("\uD950\uDF21"
= "") will all be treated as a single text element.
Specifically, the GetTextElementEnumerator static method will allow you to enumerate over each text element in a string
(see the linked MSDN page for a code example).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With