Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

File name look the same but is different after copying

My file names look the same but they are not.

I copied many_img/ from Debian1 to OS X, then from OS X to Debian2 (for maintenance purpose) with using rsync -a -e ssh on each step to preserve everything.

If i do ls many_img/img1/* i get visually the same output on Debian1 and Debian2 :

prévisionnel.jpg

But somehow, ls many_img/img1/* | od -c gives different results:

On Debian1:

0000000   p   r 303 251   v   i   s   i   o   n   n   e   l  .   j   p
0000020   g  \n

On Debian2:

0000000   p   r   e 314 201   v   i   s   i   o   n   n   e   l  .   j
0000020   p   g  \n

Thus my web app on Debian2 cannot match the picture in the file system with filename in database.

i thought maybe i need to change file encoding, but it looks like it's already utf-8 on every OS:

convmv --notest -f iso-8859-15 -t utf8 many_img/img1/* 

Returns:

Skipping, already UTF-8

Is there a command to get back all my 40 thousands file names like on my Debian 1 from my Debian 2 (without transfering all again) ? I am confused if it is a file name encoding problem or anything else ?

like image 412
Yoric Avatar asked Sep 05 '25 17:09

Yoric


2 Answers

I finaly found command line conversion tools i was looking for (thanks @Mark for setting me on the right track !)

Ok, i didn't know OS X was encoding file names under the hood with a different UTF-8 Normalization.

  • It appears OS X is using Unicode Normalization Form D (NFD)
  • while Linux OS are using Unicode Normalization Form C (NFC)

HSF+ file system encode every single file name character in UTF-16. Unicode characters are Decomposed on OS X versus Precomposed on Linux OS.

é for instance (Latin small letter e with acute accent), is technically a (U+00E9) character on Linux and is decomposed into a base letter "e" (U+0065) and an acute accent (U+0301) in its decomposed form (NFD) on OS X.

Now about conversion tools:

  1. This command executed from Linux OS will convert file name from NFD to NFC:

    convmv --notest --nfc -f utf8 -t utf8 /path/to/my/file

  2. This command executed from OS X will rsync over ssh with NFD to NDC on the fly conversion:

    rsync -a --iconv=utf-8-mac,utf-8 -e ssh path/to/my/local/directory/* user@destinationip:/remote/path/

I tested the two methods and it works like a charm.

Note:

--iconv option is only available with rsync V3 whereas OS X provides an old 2.6.9 version by default so you'll need to update it first.

Typically to check and upgrade :

rsync --version
brew install rsync
echo 'export PATH=/usr/local/bin:$PATH' >> ~/.profile
like image 183
Yoric Avatar answered Sep 07 '25 09:09

Yoric


The first filename contains the single character é while the second contains a simple e followed by the combining character ́ (COMBINING ACUTE ACCENT). They're both valid Unicode, they're just normalized differently. It appears the OS normalized the filename as it created the file.

like image 34
Mark Ransom Avatar answered Sep 07 '25 07:09

Mark Ransom