C++ Text file won't save in Unicode, it keeps saving in ANSI

Tags:

So basically, I need to be able to create a text file in Unicode, but whatever I do it keeps saving in ANSI.

Here's my code:

    wchar_t name[] = L"‎中國哲學書電子化計劃";
    FILE * pFile;
    pFile = fopen("chineseLetters.txt", "w");

    fwrite(name, sizeof(wchar_t), sizeof(name), pFile);
    fclose(pFile);

And here is the output of my "chineseLetters.txt":

     -NWòTx[øfû–P[SŠƒR  õ2123

Also, the application is in MBCS and cannot be changed into Unicode, because it needs to work with both Unicode and ANSI.

I'd really appreciate some help here. Thanks.

Thanks for all the quick replies! It works!

Simply adding L"\uFFFE‎中國哲學書電子化計劃" still didn't work, the text editor still recognized it as CP1252 so I did 2 fwrite instead of one, one for the BOM and one for the characters, here's my code now:

    wchar_t name[] = L"‎中國哲學書電子化計劃";
    unsigned char bom[] = { 0xFF, 0xFE };
    FILE * pFile;
    pFile = fopen("chineseLetters.txt", "w");
    fwrite(bom, sizeof(unsigned char), sizeof(bom), pFile);
    fwrite(name, sizeof(wchar_t), wcslen(name), pFile);
    fclose(pFile);

556

asked Jan 20 '15 21:01

Kelv

1 Answers

I need to be able to create a text file in Unicode

Unicode is not an encoding, do you mean UTF-16LE? This is the two-byte-code-unit encoding Windows x86/x64 uses for internal string storage in memory, and some Windows applications like Notepad misleadingly describe UTF-16LE as “Unicode” in their UI.

fwrite(name, sizeof(wchar_t), sizeof(name), pFile);

You've copied the memory storage of the string directly to a file. If you compile this under Windows/MSVCRT then because the internal storage encoding is UTF-16LE, the file you have produced is encoded as UTF-16LE. If you compile this in other environments you will get different results.

And here is the output of my "chineseLetters.txt": -NWòTx[øfû–P[SŠƒR õ2123

That's what the UTF-16LE-encoded data would look like if you misinterpreted the file as Windows Code Page 1252 (Western European).

If you have loaded the file into a Windows application such as Notepad, it probably doesn't know that the file contains UTF-16LE-encoded data, and so defaults to reading the file using your default locale-specific (ANSI, mbcs) code page as the encoding, resulting in the above mojibake.

When you are making a UTF-16 file you should put a Byte Order Mark character U+FEFF at the start of it to let the consumer know whether it's UTF-16LE or UTF-16BE. This also gives apps like Notepad a hint that the file contains UTF-16 at all, and not ANSI. So you would probably find that writing L"\uFEFF‎中國哲學書電子化計劃" would make the output file display better in Notepad.

But it's probably better to convert the wchar_ts into char bytes in a particular desired encoding stated explicitly (eg UTF-8), rather than relying on what in-memory storage format the C library happens to use. On Win32 you can do this using the WideCharToMultibyte API, or with wide-open ccs as described by Mr.C64. If you choose to write a UTF-16LE file with ccs it will put the BOM in for you.

answered Sep 30 '22 00:09

bobince

Related questions
                            
                                Enumerating array in reverse order using size_t index
                            
                                Qt creator high cpu usage in editor
                            
                                What's the complexity of map/set :: insert if one has provided a correct iterator hint?
                            
                                Visual Studio 2010 C++ not displaying console output, despite console subsystem setting
                            
                                How to correctly convert cv::Mat to CV_8UC1?
                            
                                How to compare class types in C++?
                            
                                JsonCpp Writing back to the Json File
                            
                                Multiple static library inclusion in CMake TARGET_LINK_LIBRARIES
                            
                                Accidently I forgot to return value from function but when I returned reference in function declaration it worked.Why?
                            
                                Slice off overridden method by casting
                            
                                When move constructor are called
                            
                                Weird result with std::chrono::duration_cast for 1 second and 2 second
                            
                                STL associative containers: erasing and getting back the (noncopyable) element
                            
                                Why can a member function be called on a temporary but a global function cannot?
                            
                                c++ armadillo cast/convert to integer type vector or matrix
                            
                                Will const and constexpr eventually be the same thing?
                            
                                Polymorphism with QVariant
                            
                                incorrect checksum for freed object
                            
                                C++ Code to Generate Permutations [closed]
                            
                                How to let cmake use "-pthread" instead of -lpthread"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C++ Text file won't save in Unicode, it keeps saving in ANSI

Tags:

c++

file

text

unicode

fwrite

Kelv

People also ask

1 Answers

bobince

Recent Activity

Donate For Us