Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string and 4-byte Unicode characters

I have one question about strings and chars in C#. I found that a string in C# is a Unicode string, and a char takes 2 bytes. So every char is in UTF-16 encoding. That's great, but I also read on Wikipedia that there are some characters that in UTF-16 take 4 bytes.

I'm doing a program that lets you draw characters for alphanumerical displays. In program there is also a tester, where you can write some string, and it draws it for you to see how it looks.

So how I should work with strings, where the user writes a character which takes 4 bytes, i.e. 2 chars. Because I need to go char by char through the string, find this char in the list, and draw it into the panel.

like image 304
Arxeiss Avatar asked Oct 06 '22 13:10

Arxeiss


1 Answers

You you could do:

for( int i = 0; i < str.Length; ++i ) {
    int codePoint = Char.ConvertToUTF32( str, i );
    if( codePoint > 0xffff ) {
        i++;
    }
}

Then the codePoint represents any possible code point as a 32 bit integer.

like image 107
Esailija Avatar answered Oct 10 '22 02:10

Esailija