Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode filenames in iOS

Is it possible to use the full range of (let's say) the Chinese language in filenames of assets (images) within iOS? If not, what portions of big languages are supported in filenames, string searches and other file handling activities?

like image 673
Confused Avatar asked Oct 20 '16 23:10

Confused


2 Answers

iOS and Mac OS currently use the HFS+ filesystem, which supports full Unicode in filenames. This means essentially any character, including Chinese and other human languages. The filesystem allows up to 255 characters, which for most languages is about 255 code points. (I see a note that the length is based on UTF16-encoded characters. There are characters which require more than 16 bits to encode, like emoji, which you can also use, but you'll have fewer characters allowed.)

The file APIs on iOS (NSFileManager, etc) should accommodate Unicode strings without any extra work. Do note that Unicode sequences are canonicalized in a particular way: e.g. an é character can be represented in multiple different ways in Unicode, but will be decomposed in a standardized way as a filename.

The bottom line is, you can feel free to use Unicode strings as your filenames as long as they are of reasonable length. Because superlong Unicode names will start running into length issues in a slightly unpredictable way (really just complicated and unnecessary to compute), you should probably set some sane self-imposed length limits.

APFS is the next-gen filesystem that Apple is developing, and will appear on iOS at some point soon. I can't find info on file name encoding but it's a fair assumption that it will support anything HFS+ supports, if not more so.

like image 83
Ben Zotto Avatar answered Nov 09 '22 17:11

Ben Zotto


The iOS filesystem uses case-sensitive HFSX, which is a variant of HFS Plus and uses the same rules for filenames and character encodings.

Those rules are laid out in several sections of Apple Technote 1150.

The important considerations are:

  • You may use up to 255 16-bit Unicode characters per file or folder name as described in the HFS Plus Names section of Technote 1150.
  • The filesystem at its base level uses Unicode v2.0 (this is fixed) and strings must be stored in fully decomposed, canonical order. This precludes the use of some "equivalent forms" -- i.e. they must be converted to decomposed form. This is described in detail in the Unicode Subtleties section of Technote 1150. This section details other issues and should be read carefully.
  • A list of illegal characters can be found in this Decomposition Table.
  • The colon character ':' is used as a directory separator and is invalid in file and folder names.
like image 2
par Avatar answered Nov 09 '22 17:11

par