Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open utf8 encoded filename in c++ Windows

Tags:

c++

windows

Consider the following code:

#include <iostream>
#include <boost\locale.hpp>
#include <Windows.h>
#include <fstream>

std::string ToUtf8(std::wstring str)
{
    std::string ret;
    int len = WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0, NULL, NULL);
    if (len > 0)
    {
        ret.resize(len);
        WideCharToMultiByte(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len, NULL, NULL);
    }
    return ret;
}

int main()
{
    std::wstring wfilename = L"D://Private//Test//एउटा फोल्दर//भित्रको फाईल.txt";
    std::string utf8path = ToUtf8(wfilename );
    std::ifstream iFileStream(utf8path , std::ifstream::in | std::ifstream::binary);
    if(iFileStream.is_open())
    {
        std::cout << "Opened the File\n";
        //Do the work here.
    }
    else
    {
        std::cout << "Cannot Opened the file\n";

    }
    return 0;

}

If I am running the file, I cannot open the file thus entering into the else block. Even using boost::locale::conv::from_utf(utf8path ,"utf_8") instead of utf8path doesn't work. The code works if I consider using wifstream and using wfilename as its parameter, but I don' want to use wifstream. Is there any way to open the file with its name utf8 encoded? I am using Visual Studio 2010.

like image 884
Pant Avatar asked Jun 14 '15 12:06

Pant


2 Answers

On Windows, you MUST use 8bit ANSI (and it must match the user's locale) or UTF-16 for filenames, there is no other option available. You can keep using string and UTF-8 in your main code, but you will have to convert UTF-8 filenames to UTF-16 when you are opening files. Less efficient, but that is what you need to do.

Fortunately, VC++'s implementation of std::ifstream and std::ofstream have non-standard overloads of their constructors and open() methods to accept wchar_t* strings for UTF-16 filenames.

explicit basic_ifstream(
    const wchar_t *_Filename,
    ios_base::openmode _Mode = ios_base::in,
    int _Prot = (int)ios_base::_Openprot
);

void open(
    const wchar_t *_Filename,
    ios_base::openmode _Mode = ios_base::in,
    int _Prot = (int)ios_base::_Openprot
);
void open(
    const wchar_t *_Filename,
    ios_base::openmode _Mode
);
explicit basic_ofstream(
    const wchar_t *_Filename,
    ios_base::openmode _Mode = ios_base::out,
    int _Prot = (int)ios_base::_Openprot
);

void open(
    const wchar_t *_Filename,
    ios_base::openmode _Mode = ios_base::out,
    int _Prot = (int)ios_base::_Openprot
);
void open(
    const wchar_t *_Filename,
    ios_base::openmode _Mode
);

You will have to use an #ifdef to detect Windows compilation (unfortunately, different C++ compilers identify that differently) and temporarily convert your UTF-8 string to UTF-16 when opening a file.

#ifdef _MSC_VER
std::wstring ToUtf16(std::string str)
{
    std::wstring ret;
    int len = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
    if (len > 0)
    {
        ret.resize(len);
        MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len);
    }
    return ret;
}
#endif

int main()
{
    std::string utf8path = ...;
    std::ifstream iFileStream(
        #ifdef _MSC_VER
        ToUtf16(utf8path).c_str()
        #else
        utf8path.c_str()
        #endif
        , std::ifstream::in | std::ifstream::binary);
    ...
    return 0;
}

Note that this is only guaranteed to work in VC++. Other C++ compilers for Windows are not guaranteed to provide similar extensions.

UPDATE: as of Windows 10 Insider Preview Build 17035, Microsoft now supports UTF-8 as a system-wide encoding that users can set their locale to. And as of Windows 10 Version 1903 (build 18362), applications can now opt in via their app manifest to use UTF-8 as a process-wide codepage, even if the user locale is not set to UTF-8. These features allow ANSI-based APIs (like CreateFileA(), which std::ifstream/std::ofstream use internally) to work with UTF-8 strings. So, in theory, with this feature turned on, you might be able to pass a UTF-8 encoded string to std::ifstream/std::ofstream and it would "just work". I can't confirm that, as it very much depends on the implementation. It would be safer to stick with passing in UTF-16 filenames, since that is Windows' native encoding, which the ANSI APIs will simply convert to internally.

like image 174
Remy Lebeau Avatar answered Oct 23 '22 17:10

Remy Lebeau


You can use std::filesystem::u8path in C++14/17:

std::filesystem::path pa = std::filesystem::u8path((const char*)yourStdStringPath.c_str());
std::ofstream ofs(pa);

It's deprecated in C++20 since you can use the u8 prefix.

like image 43
Kaaf Avatar answered Oct 23 '22 16:10

Kaaf