Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross-platform way to handle std::string/std::wstring with std::filesystem::path

I have a sample piece of C++ code that is throwing an exception on Linux:

namespace fs = std::filesystem;
const fs::path pathDir(L"/var/media");
const fs::path pathMedia = pathDir / L"COMPACTO - Diogo Poças.mxf" // <-- Exception thrown here

The exception being thrown is: filesystem error: Cannot convert character sequence: Invalid in or incomplete multibyte or wide character

I surmise that the issue is related to the use of the ç character.

  1. Why is this wide string (wchar_t) an "invalid or incomplete multibyte or wide character"?
  2. Going forward, how do I make related code cross-platform to run on Windows and/or Linux.
    • Are there helper functions I need to use?
    • What rules do I need to enforce from a programmer's PoV?
    • I've seen a response here that says "Don't use wide strings on Linux", do I use the same rules for Windows?

Linux Environment (not forgetting the fact that I'd like to run cross-platform):

  • Ubuntu 18.04.3
  • gcc 9.2.1
  • C++17
like image 355
ZeroDefect Avatar asked Oct 23 '19 11:10

ZeroDefect


People also ask

What is std :: Wstring?

std::to_wstring in c++This function is used to convert the numerical value to the wide string i.e. it parses a numerical value of datatypes (int, long long, float, double ) to a wide string. It returns a wide string of data type wstring representing the numerical value passed in the function.

How do you convert STD Wstring to CString?

The easiest solution is to use Unicode string literals and std::wstring: wstring z = L"nüşabə"; CString cs(z. c_str()); nameData. SetWindowTextW(cs);


2 Answers

Unfortunately std::filesystem was not written with operating system compatibility in mind, at least not as advertised.

For Unix based systems, we need UTF8 (u8"string", or just "string" depending on the compiler)

For Windows, we need UTF16 (L"string")

In C++17 you can use filesystem::u8path (which for some reason is deprecated in C++20). In Windows, this will convert UTF8 to UTF16. Now you can pass UTF16 to APIs.

#ifdef _WINDOWS_PLATFORM
    //windows I/O setup
    _setmode(_fileno(stdin), _O_WTEXT);
    _setmode(_fileno(stdout), _O_WTEXT);
#endif

fs::path path = fs::u8path(u8"ελληνικά.txt");

#ifdef _WINDOWS_PLATFORM
    std::wcout << "UTF16: " << path << std::endl;
#else
    std::cout <<  "UTF8:  " << path << std::endl;
#endif

Or use your own macro to set UTF16 for Windows (L"string"), and UTF8 for Unix based systems (u8"string" or just "string"). Make sure UNICODE is defined for Windows.

#ifdef _WINDOWS_PLATFORM
#define _TEXT(quote) L##quote
#define _tcout std::wcout
#else
#define _TEXT(quote) u8##quote
#define _tcout std::cout
#endif

fs::path path(_TEXT("ελληνικά.txt"));
_tcout << path << std::endl;

See also
https://en.cppreference.com/w/cpp/filesystem/path/native


Note, Visual Studio has a special constructor for std::fstream which allows using UTF16 filename, and it's compatible for UTF8 read/write. For example the following code will work in Visual Studio:
fs::path utf16 = fs::u8path(u8"UTF8 filename ελληνικά.txt");
std::ofstream fout(utf16);
fout << u8"UTF8 content ελληνικά";

I am not sure if that's supported on latest gcc versions running on Windows.

like image 128
Barmak Shemirani Avatar answered Sep 22 '22 12:09

Barmak Shemirani


Looks like a GCC bug.

According to std::filesystem::path::path you should be able to call std::filesystem::path constructor with a wide-character string and that independent of underlying platform (that's the whole point of std::filesystem).

Clang shows correct behavior.

like image 37
plexando Avatar answered Sep 18 '22 12:09

plexando