Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF8 scrambling during c++ file loading

I know loading unicode is a somewhat laboured point, but I can't see how to apply the solutions presented to others to my particular problem.

I have a Win7/C++/DirectX9 GUI library which can render text to the screen. I've never had a problem before since it has only be used with Western European language. Now I have to use it with Hungarian, and it is giving me a headache! My particular problem is with loading the special characters found in that language.

Take this example, FELNŐTTEKNEK, meaning ADULT.

If I hard code this string into my app, it renders correctly:

guiTitle->SetText( L"FELNŐTTEKNEK" );

This stores the string as a std::wstring, rendering it with ID3DXFont::DrawTextW(). It also proves my chosen font, Futura CE, is able to render the special characters (CE = Central European).

So far so good. Next I simply want to be able to load the text from a text file. No big deal. However the results are bad! The special Ő is replaced by another character, mainly Å or even two characters like Å (2nd one usually unprintable)

I have ensured by input text file is encoded as UTF-8 and am naively trying to load it thus:

wifstream f("data/language.ini");
wstring w;  
getline( f, w );    
guiTitle->SetText( w );

Somehow I am still scrambling it. Am I loading as UTF-8? Is there a way to ensure this? I just need to ensure I have a wide string with the text as show in text editor.

Any assistance most gratefully received.

Si

like image 429
sipickles Avatar asked Jan 19 '23 23:01

sipickles


2 Answers

Forget about wifstream, it's just too hard to make it work. Do:

ifstream f(L"data/language.ini");
string str;  
getline( f, str );
guiTitle->SetText( utf8_to_utf16(str).c_str() );

And use MultiByteToWideChar to implement utf8_to_utf16.

See also https://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful.

like image 196
Yakov Galka Avatar answered Feb 02 '23 09:02

Yakov Galka


DrawTextW is expecting UTF-16.

What you're doing is converting each UTF-8 code unit (byte) into a 16 bit value by zero padding it - this correctly converts UTF-8 to UTF-16 only if your UTF-8 exclusively contains characters from the ascii subset of unicode.

What you need to do is to correctly convert from UTF-8 to UTF-16. Load the string into a std::string (not a std::wstring) then convert that UTF-8 string into a UTF-16 string and pass it to the API expecting a UTF-16 string.

like image 33
JoeG Avatar answered Feb 02 '23 10:02

JoeG