string and 4-byte Unicode characters

Question

I have one question about strings and chars in C#. I found that a string in C# is a Unicode string, and a char takes 2 bytes. So every char is in UTF-16 encoding. That's great, but I also read on Wikipedia that there are some characters that in UTF-16 take 4 bytes.

I'm doing a program that lets you draw characters for alphanumerical displays. In program there is also a tester, where you can write some string, and it draws it for you to see how it looks.

So how I should work with strings, where the user writes a character which takes 4 bytes, i.e. 2 chars. Because I need to go char by char through the string, find this char in the list, and draw it into the panel.

Esailija · Accepted Answer

You you could do:

for( int i = 0; i < str.Length; ++i ) {
    int codePoint = Char.ConvertToUTF32( str, i );
    if( codePoint > 0xffff ) {
        i++;
    }
}

Then the codePoint represents any possible code point as a 32 bit integer.

string and 4-byte Unicode characters

Tags:

string

c#

unicode

astral-plane

Arxeiss

1 Answers

Esailija

Recent Activity

Donate For Us

string and 4-byte Unicode characters

Tags:

string

c#

unicode

astral-plane

Arxeiss

1 Answers

Esailija

Related questions

Recent Activity

Donate For Us