I have a sample piece of C++ code that is throwing an exception on Linux:
namespace fs = std::filesystem;
const fs::path pathDir(L"/var/media");
const fs::path pathMedia = pathDir / L"COMPACTO - Diogo Poças.mxf" // <-- Exception thrown here
The exception being thrown is: filesystem error: Cannot convert character sequence: Invalid in or incomplete multibyte or wide character
I surmise that the issue is related to the use of the ç
character.
Linux Environment (not forgetting the fact that I'd like to run cross-platform):
std::to_wstring in c++This function is used to convert the numerical value to the wide string i.e. it parses a numerical value of datatypes (int, long long, float, double ) to a wide string. It returns a wide string of data type wstring representing the numerical value passed in the function.
The easiest solution is to use Unicode string literals and std::wstring: wstring z = L"nüşabə"; CString cs(z. c_str()); nameData. SetWindowTextW(cs);
Unfortunately std::filesystem
was not written with operating system compatibility in mind, at least not as advertised.
For Unix based systems, we need UTF8 (u8"string"
, or just "string"
depending on the compiler)
For Windows, we need UTF16 (L"string"
)
In C++17 you can use filesystem::u8path
(which for some reason is deprecated in C++20). In Windows, this will convert UTF8 to UTF16. Now you can pass UTF16 to APIs.
#ifdef _WINDOWS_PLATFORM
//windows I/O setup
_setmode(_fileno(stdin), _O_WTEXT);
_setmode(_fileno(stdout), _O_WTEXT);
#endif
fs::path path = fs::u8path(u8"ελληνικά.txt");
#ifdef _WINDOWS_PLATFORM
std::wcout << "UTF16: " << path << std::endl;
#else
std::cout << "UTF8: " << path << std::endl;
#endif
Or use your own macro to set UTF16 for Windows (L"string"
), and UTF8 for Unix based systems (u8"string"
or just "string"
). Make sure UNICODE
is defined for Windows.
#ifdef _WINDOWS_PLATFORM
#define _TEXT(quote) L##quote
#define _tcout std::wcout
#else
#define _TEXT(quote) u8##quote
#define _tcout std::cout
#endif
fs::path path(_TEXT("ελληνικά.txt"));
_tcout << path << std::endl;
See also
https://en.cppreference.com/w/cpp/filesystem/path/native
std::fstream
which allows using UTF16 filename, and it's compatible for UTF8 read/write. For example the following code will work in Visual Studio:
fs::path utf16 = fs::u8path(u8"UTF8 filename ελληνικά.txt");
std::ofstream fout(utf16);
fout << u8"UTF8 content ελληνικά";
I am not sure if that's supported on latest gcc versions running on Windows.
Looks like a GCC bug.
According to std::filesystem::path::path you should be able to call std::filesystem::path constructor with a wide-character string and that independent of underlying platform (that's the whole point of std::filesystem).
Clang shows correct behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With