Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write UTF-8 file with fprintf in C++

I am programming (just occassionally) in C++ with VisualStudio and MFC. I write a file with fopen and fprintf. The file should be encoded in UTF8. Is there any possibility to do this? Whatever I try, the file is either double byte unicode or ISO-8859-2 (latin2) encoded.

Glanebridge

like image 649
Glanebridge Avatar asked Apr 05 '12 12:04

Glanebridge


3 Answers

You shouldn't need to set your locale or set any special modes on the file if you just want to use fprintf. You simply have to use UTF-8 encoded strings.

#include <cstdio>
#include <codecvt>

int main() {
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
    std::string utf8_string = convert.to_bytes(L"кошка 日本国");

    if(FILE *f = fopen("tmp","w"))
        fprintf(f,"%s\n",utf8_string.c_str());
}

Save the program as UTF-8 with signature or UTF-16 (i.e. don't use UTF-8 without signature, otherwise VS won't produce the right string literal). The file written by the program will contain the UTF-8 version of that string. Or you can do:

int main() {
    if(FILE *f = fopen("tmp","w"))
        fprintf(f,"%s\n","кошка 日本国");
}

In this case you must save the file as UTF-8 without signature, because you want the compiler to think the source encoding is the same as the execution encoding... This is a bit of a hack that relies on the compiler's, IMO, broken behavior.

You can do basically the same thing with any of the other APIs for writing narrow characters to a file, but note that none of these methods work for writing UTF-8 to the Windows console. Because the C runtime and/or the console is a bit broken you can only write UTF-8 directly to the console by doing SetConsoleOutputCP(65001) and then using one of the puts variety of function.

If you want to use wide characters instead of narrow characters then locale based methods and setting modes on file descriptors could come into play.

#include <cstdio>
#include <fcntl.h>
#include <io.h>

int main() {
    if(FILE *f = fopen("tmp","w")) {
        _setmode(_fileno(f), _O_U8TEXT);
        fwprintf(f,L"%s\n",L"кошка 日本国");
    }
}

#include <fstream>
#include <codecvt>

int main() {
    if(auto f = std::wofstream("tmp")) {
        f.imbue(std::locale(std::locale(),
                new std::codecvt_utf8_utf16<wchar_t>)); // assumes wchar_t is UTF-16
        f << L"кошка 日本国\n";
    }
}
like image 124
bames53 Avatar answered Oct 18 '22 20:10

bames53


Yes, but you need Visual Studio 2005 or later. You can then call fopen with the parameters:

LPCTSTR strText = "абв";
FILE *f = fopen(pszFilePath, "w,ccs=UTF-8");
_ftprintf(f, _T("%s"),  (LPCTSTR) strText);

Keep in mind this is Microsoft extension, it probably won't work with gcc or other compilers.

like image 22
sashoalm Avatar answered Oct 18 '22 19:10

sashoalm


In theory, you should simply set a locale which uses UTF-8 as external encoding. My understanding -- I'm not a Windows programmer -- is that Windows has no such locale, so you have to resort to implementation specific means or non standard libraries (link from Dave's comment).

like image 34
AProgrammer Avatar answered Oct 18 '22 19:10

AProgrammer