I have a problem with some files in linux (Ubuntu) terminal, with accents in the names. For example:
$ ls dir/
criação.png
So, the terminal returns that file, so it exists. Now lets look if the file exists, with this simple command:
$ [ -f criação.png ] && echo "File Exist" || echo "Not Exist"
Not Exist
As you can see, "Not Exist". Now, i have the same folder and file on OSX, then I run the same command and it returns this:
$ [ -f criação.png ] && echo "File Exist" || echo "Not Exist"
File Exist
I know a little about locale:
$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
On linux, "Not Exist", on OSX, "File Exist"... Someone, know how to fix that?
Maybe these can help:
http://nedbatchelder.com/blog/201106/filenames_with_accents.html
http://www.ruby-forum.com/topic/279105
UPDATE - Solution
I finally found a solution to that problem. You need to rename your files from NFD to NFC, here is the command to fix all files:
cd dir/
convmv -r -i -f utf8 -t utf8 --nfc --notest .
Source: http://blog.hbis.fr/2010/08/30/macox-utf8_filenames_normalization/
Using the arrow key, navigate up and down to choose en_US-UTF-8 or any other UTF-8 locale. After that, again, it will ask you to select the default locale. On this screen, also select en_US. UTF-8.
To verify if a file passes an encoding such as ascii, iso-8859-1, utf-8 or whatever then a good solution is to use the 'iconv' command.
Open the file in Notepad. Click 'Save As...'. In the 'Encoding:' combo box you will see the current file format. Yes, I opened the file in notepad and selected the UTF-8 format and saved it.
$ iconv -f UTF-8 your_file > /dev/null; echo $? The command will return 0 if the file could be converted successfully, and 1 if not. Additionally, it will print out the byte offset where the invalid byte sequence occurred. Edit: The output encoding doesn't have to be specified, it will be assumed to be UTF-8.
One of the reasons might be the file name uses a different unicode normalization form of characters with combining marks than you use to type the name. See Unicode Equivalence.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With