Is there a standard way to do an fopen
with a Unicode string file path?
No, there's no standard way. There are some differences between operating systems. Here's how different OSs handle non-ASCII filenames.
Under Linux, a filename is simply a binary string. The convention on most modern distributions is to use UTF-8 for non-ASCII filenames. But in the beginning, it was common to encode filenames as ISO-8859-1. It's basically up to each application to choose an encoding, so you can even have different encodings used on the same filesystem. The LANG
environment variable can give you a hint what the preferred encoding is. But these days, you can probably assume UTF-8 everywhere.
This is not without problems, though, because a filename containing an invalid UTF-8 sequence is perfectly valid on most Linux filesystems. How would you specify such a filename if you only support UTF-8? Ideally, you should support both UTF-8 and binary filenames.
The HFS filesystem on OS X uses Unicode (UTF-16) filenames internally. Most C (and POSIX) library functions like fopen
accept UTF-8 strings (since they're 8-bit compatible) and convert them internally.
The Windows API uses UTF-16 for filenames, but fopen
uses the current codepage, whatever that is (UTF-8 just became an option). Many C library functions have a non-standard equivalent that accepts UTF-16 (wchar_t
on Windows). For example, _wfopen
instead of fopen
.
In *nix, you simply use the standard fopen (see more information in reply from TokeMacGuy, or in this forum) In windows, you can use _wfopen, and then pass a unicode string (for more information, see MSDN).
As there is no real common way, I would wrap this call in a macro, together with all other system-dependent functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With