Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read binary file with unicode filename c++?

In the project I'm working on, I deal with quite a few string manipulations; strings are read from binary files along with their encoding (which can be single or double byte). Essentially, I read the string value as vector<char>, read the encoding and then convert all strings to wstring, for consistency.

This works reasonably well, however the filenames themselves can be double-byte chars. I'm totally stumped on how to actually open the input stream. In C I would use _wfopen function passing wchar_t* path, but wifstream seems to behave differently, as it's specifically designed for reading double-byte chars from a file, not for reading single bytes from a file with double-byte filename.

What is the solution to this problem?

Edit: Searching the net, it looks like there's no support for this at all in standard C++ (e.g. see this discussion). However I'm wondering if C++11 actually adds something useful in this area.

like image 216
Aleks G Avatar asked Jan 04 '13 13:01

Aleks G


1 Answers

How the string you pass to open is mapped to a filename is implementation dependent. In a Unix environment, it is passed almost literally—only '/' and '\0' are treated specially. In other environments, other rules rule, and I've had problems in the past because I'd written a file in Unix, and couldn't do anything with it under Windows (which treats a ':' in the filename specially).

Another question is where these files come from. As mentionned above, there may be absolutely no way of opening them on your system: a filename with a ':' simply cannot be opened in Windows. In Unix, if you end up with '\0' characters in the filename itself, you probably can't read them either, and UTF16 filenames will appear to have '\0' characters in them under Unix. You're only solution may be to use native tools on the system which generated the files to rename them.

It's less clear to me how you could get such filenames on a Unix disk in the first place. How does an SMB server such as Samba map UTF16 filenames when it is serving on a Windows box? Or an NFS server—I think such things also exist under Windows.

like image 117
James Kanze Avatar answered Nov 19 '22 03:11

James Kanze