Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What encoding are filenames in NTFS stored as?

I'm just getting started on some programming to handle filenames with non-english names on a WinXP system. I've done some recommended reading on unicode and I think I get the basic idea, but some parts are still not very clear to me.

Specifically, what encoding (UTF-8, UTF-16LE/BE) are the file names (not the content, but the actual name of the file) stored in NTFS? Is it possible to open any file using fopen(), which takes a char*, or do I have no choice but to use wfopen(), which uses a wchar_t*, and presumably takes a UTF-16 string?

I tried manually feeding in a UTF-8 encoded string to fopen(), eg.

unsigned char filename[] = {0xEA, 0xB0, 0x80, 0x2E, 0x74, 0x78, 0x74, 0x0}; // 가.txt  FILE* f = fopen((char*)filename, "wb+"); 

but this came out as 'ê°€.txt'.

I was under the impression (which may be wrong) that a UTF8-encoded string would suffice in opening any filename under Windows, because I seem to vaguely remember some Windows application passing around (char*), not (wchar_t*), and having no problems.

Can anyone shed some light on this?

like image 820
vroooom Avatar asked Jan 12 '10 17:01

vroooom


People also ask

Is NTFS a Unicode?

NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16, and FAT32 file systems use the OEM character set.

What file encoding does Windows use?

The values stored in memory for Windows are UTF-16 little-endian, always.

How do I know if a file is UTF encoded?

Open the file using Notepad++ and check the "Encoding" menu, you can check the current Encoding and/or Convert to a set of encodings available.

How do you find out what encoding a file uses?

Open up your file using regular old vanilla Notepad that comes with Windows. It will show you the encoding of the file when you click "Save As...". Whatever the default-selected encoding is, that is what your current encoding is for the file.


1 Answers

NTFS stores filenames in UTF-16, however fopen is using ANSI (not UTF-8).

In order to use an UTF16-encoded file name you will need to use the Unicode versions of the file open calls. Do this by defining UNICODE and _UNICODE in your project. Then use the CreateFile call or the wfopen call.

like image 127
villintehaspam Avatar answered Sep 25 '22 07:09

villintehaspam