Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Unicode decomposition when transferring files to web server

I am doing website development on OS X, and fairly often I find myself in situations where I move some part of a live website (running Linux/LAMP) to a development server running on my own machine. One such instance involves downloading images (user generated content, e.g. via ftp download), processing them in one way or another and the putting them back on the production site.

The image files involved, being created in a Linux machine, appears to have their filenames encoded in UTF-8 using NFC decomposition. OS X's HFS+ file system on the other hand does not allow NFC decomposed filenames and converts into NFD. However, once I am done and want to upload the files their names will now be using NFD decompositions, since Linux supports them both. As a result, the newly uploaded (and in some cases replaced) files will not be accessible at the expected URL.

I'm looking for a way to change the UTF decomposition of the files during (preferably) or after (convmv looks like a good option, but I don't have sufficient permissions on this server it's not possible in this particular case) transfer, since I'm guessing it's impossible doing it beforehand. I've tried FTP-upload using Transmit and rsync (using a deploy script a normally use) to no avail. the --iconv option in rsync seemed ideal, but unfortunately my server running rsync 2.6.9 did not recognize it.

I'm guessing quite a few people are having similar issues, I'll be happy to hear any solution or workaround!

UPDATE: In this case I ended up rsyncing the files to a virtual machine running Ubuntu, running convmv on them on there, and then rsyncing again to my staging server. While this works fairly well it is a bit time consuming. Perhaps it would be possible to mount an ext file system on OS X and just store the files there instead, using their original NFC decomposed file names?

Also, to avoid this problems all together on future WordPress installs, which was my use case, you could add a simple add_filter('sanitize_file_name', 'remove_accents'); before uploading any files and you should be fine.

like image 280
Simon Avatar asked Sep 28 '12 15:09

Simon


2 Answers

It seems that rsync --iconv is the best solution, as you can transfer the files and transcode the names all in one step. You just need to convince your host to upgrade their rsync. Given that the --iconv feature was introduced in rsync 3.0.0, which was released in 2008, it's a bit odd that your host is still running rsync 2.6.9.

If you can't convince your host to install an up-to-date rsync, you could compile your own rsync, upload it somewhere like ~/bin on the server, and add that to your path before the system installed rsync. Then you should be able to use the --iconv option. This should work as long as you are using rsync over SSH (the default), not the rsync daemon; because rsync over SSH works by SSHing to the remote machine, and running rsync --server with the same options that you passed to your local rsync.

Or you could find a host that has up-to-date tools and Perl installed.

like image 169
Brian Campbell Avatar answered Nov 06 '22 22:11

Brian Campbell


Currently I'm using rsync --iconv like this:

Given Linux server and OS X machine:

Copying files from server to machine

You should execute this command from server (it won't work from OS X):

rsync --iconv=UTF-8,UTF-8-MAC /home/username/path/on/server/ '[email protected]:/Users/username/path/on/machine/'

Copying files from machine to server

You should execute this command from machine:

rsync --iconv=UTF-8-MAC,UTF-8 /Users/username/path/on/machine/ '[email protected]:/home/username/path/on/server/'
like image 5
Envek Avatar answered Nov 06 '22 22:11

Envek