Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set file encoding format to UTF8 in C++

A requirement for my software is that the encoding of a file which contains exported data shall be UTF8. But when I write the data to the file the encoding is always ANSI. (I use Notepad++ to check this.)

What I'm currently doing is trying to convert the file manually by reading it, converting it to UTF8 and writing the text to a new file.

line is a std::string
inputFile is an std::ifstream
pOutputFile is a FILE*

// ...

if( inputFile.is_open() )
{
    while( inputFile.good() )
    {
        getline(inputFile,line);

        //1
        DWORD dwCount = MultiByteToWideChar( CP_ACP, 0, line.c_str(), -1, NULL, 0 );
        wchar_t *pwcharText;
        pwcharText = new wchar_t[ dwCount];

        //2
        MultiByteToWideChar( CP_ACP, 0, line.c_str(), -1, pwcharText, dwCount );

        //3
        dwCount = WideCharToMultiByte( CP_UTF8, 0, pwcharText, -1, NULL, 0, NULL, NULL );
        char *pText;
        pText = new char[ dwCount ];

        //4
        WideCharToMultiByte( CP_UTF8, 0, pwcharText, -1, pText, dwCount, NULL, NULL );

        fprintf(pOutputFile,pText);
        fprintf(pOutputFile,"\n");

        delete[] pwcharText;
        delete[] pText;
    }
}

// ...

Unfortunately the encoding is still ANSI. I searched a while for a solution but I always encounter the solution via MultiByteToWideChar and WideCharToMultiByte. However, this doesn't seem to work. What am I missing here?

I also looked here on SO for a solution but most UTF8 questions deal with C# and php stuff.

like image 664
Exa Avatar asked Jul 25 '12 09:07

Exa


2 Answers

On Windows in VC++2010 it is possible (not yet implemented in GCC, as far as i know) using localization facet std::codecvt_utf8_utf16 (i.e. in C++11). The sample code from cppreference.com has all basic information you would need to read/write UTF-8 file.

std::wstring wFromFile = _T("𤭢teststring");
std::wofstream fileOut("textOut.txt");
fileOut.imbue(std::locale(fileOut.getloc(), new std::codecvt_utf8_utf16<wchar_t>));
fileOut<<wFromFile;

It sets the ANSI encoded file to UTF-8 (checked in Notepad). Hope this is what you need.

like image 80
SChepurin Avatar answered Sep 25 '22 11:09

SChepurin


On Windows, files don't have encodings. Each application will assume an encoding based on its own rules. The best you can do is put a byte-order mark at the front of the file and hope it's recognized.

like image 38
Mark Ransom Avatar answered Sep 26 '22 11:09

Mark Ransom