I'm just getting started on some programming to handle filenames with non-english names on a WinXP system. I've done some recommended reading on unicode and I think I get the basic idea, but some parts are still not very clear to me.
Specifically, what encoding (UTF-8, UTF-16LE/BE) are the file names (not the content, but the actual name of the file) stored in NTFS? Is it possible to open any file using fopen(), which takes a char*, or do I have no choice but to use wfopen(), which uses a wchar_t*, and presumably takes a UTF-16 string?
I tried manually feeding in a UTF-8 encoded string to fopen(), eg.
unsigned char filename[] = {0xEA, 0xB0, 0x80, 0x2E, 0x74, 0x78, 0x74, 0x0}; // 가.txt FILE* f = fopen((char*)filename, "wb+");
but this came out as 'ê°€.txt'.
I was under the impression (which may be wrong) that a UTF8-encoded string would suffice in opening any filename under Windows, because I seem to vaguely remember some Windows application passing around (char*), not (wchar_t*), and having no problems.
Can anyone shed some light on this?
NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.
The values stored in memory for Windows are UTF-16 little-endian, always.
Open the file using Notepad++ and check the "Encoding" menu, you can check the current Encoding and/or Convert to a set of encodings available.
Open up your file using regular old vanilla Notepad that comes with Windows. It will show you the encoding of the file when you click "Save As...". Whatever the default-selected encoding is, that is what your current encoding is for the file.
NTFS stores filenames in UTF-16, however fopen
is using ANSI (not UTF-8).
In order to use an UTF16-encoded file name you will need to use the Unicode versions of the file open calls. Do this by defining UNICODE
and _UNICODE
in your project. Then use the CreateFile
call or the wfopen
call.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With