I am programming (just occassionally) in C++ with VisualStudio and MFC. I write a file with fopen and fprintf. The file should be encoded in UTF8. Is there any possibility to do this? Whatever I try, the file is either double byte unicode or ISO-8859-2 (latin2) encoded.
Glanebridge
You shouldn't need to set your locale or set any special modes on the file if you just want to use fprintf. You simply have to use UTF-8 encoded strings.
#include <cstdio>
#include <codecvt>
int main() {
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>,wchar_t> convert;
std::string utf8_string = convert.to_bytes(L"кошка 日本国");
if(FILE *f = fopen("tmp","w"))
fprintf(f,"%s\n",utf8_string.c_str());
}
Save the program as UTF-8 with signature or UTF-16 (i.e. don't use UTF-8 without signature, otherwise VS won't produce the right string literal). The file written by the program will contain the UTF-8 version of that string. Or you can do:
int main() {
if(FILE *f = fopen("tmp","w"))
fprintf(f,"%s\n","кошка 日本国");
}
In this case you must save the file as UTF-8 without signature, because you want the compiler to think the source encoding is the same as the execution encoding... This is a bit of a hack that relies on the compiler's, IMO, broken behavior.
You can do basically the same thing with any of the other APIs for writing narrow characters to a file, but note that none of these methods work for writing UTF-8 to the Windows console. Because the C runtime and/or the console is a bit broken you can only write UTF-8 directly to the console by doing SetConsoleOutputCP(65001) and then using one of the puts
variety of function.
If you want to use wide characters instead of narrow characters then locale based methods and setting modes on file descriptors could come into play.
#include <cstdio>
#include <fcntl.h>
#include <io.h>
int main() {
if(FILE *f = fopen("tmp","w")) {
_setmode(_fileno(f), _O_U8TEXT);
fwprintf(f,L"%s\n",L"кошка 日本国");
}
}
#include <fstream>
#include <codecvt>
int main() {
if(auto f = std::wofstream("tmp")) {
f.imbue(std::locale(std::locale(),
new std::codecvt_utf8_utf16<wchar_t>)); // assumes wchar_t is UTF-16
f << L"кошка 日本国\n";
}
}
Yes, but you need Visual Studio 2005 or later. You can then call fopen with the parameters:
LPCTSTR strText = "абв";
FILE *f = fopen(pszFilePath, "w,ccs=UTF-8");
_ftprintf(f, _T("%s"), (LPCTSTR) strText);
Keep in mind this is Microsoft extension, it probably won't work with gcc or other compilers.
In theory, you should simply set a locale which uses UTF-8 as external encoding. My understanding -- I'm not a Windows programmer -- is that Windows has no such locale, so you have to resort to implementation specific means or non standard libraries (link from Dave's comment).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With