Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert filenames to use only latin characters

I have a ton of filenames in Russian (and some Slovenian & Greek). To play them in my car the song titles must use only Western European characters.

  1. Is there a program that can do this file renaming?

  2. If not, is there a list of what letter(s) to use for each Cyrillic & Greek letter?

thanks - dave

like image 314
David Thielen Avatar asked Mar 03 '12 00:03

David Thielen


5 Answers

Try a free programme called ReNamer by den4b, worked great for me.

http://www.den4b.com/?x=products

like image 153
zakk Avatar answered Oct 04 '22 21:10

zakk


I've made a shell script for this purpose, based on uconv from the icu-devtools package to perform the transliteration:

for f in "$@"
do
    if [ ! -f "$f" ]; then
        echo "$(basename $0) warn: this is not a regular file (skipped): $f" >&2
        continue
    fi

    NEWFILENAME="$(basename "$f")"
    NEWFILENAME="$( echo -n "$NEWFILENAME" | { transliterate || uconv -x 'Any-Latin;Latin-ASCII' || cat ; } )" # convert non-latin chars using my transliterate script OR uconv from the icu-devtools package
    NEWFILENAME="$( echo -n "$NEWFILENAME" | iconv -f UTF-8 -t ascii//TRANSLIT//IGNORE )"
    NEWFILENAME="$( echo -n "$NEWFILENAME" | tr -c '[A-Za-z0-9._\-]' '_' \
        | tr '\[\]' '_' \
        | sed -e 's/__*/_/g' \
        | sed -e 's/_\././g' )"
    # TODO: remove all dots except the last?
    
    if [ -f "$(basename $f)/$NEWFILENAME" ]; then
        echo "$(basename $0) warn: target filename already exists (skipped): $(basename $f)/$NEWFILENAME" >&2
        continue
    fi
    if [ "$(basename $f)" != "$NEWFILENAME" ]; then
        echo "\`$f' -> \`$NEWFILENAME'"
        mv -i "$f" "$NEWFILENAME"
    fi
done
like image 37
eadmaster Avatar answered Oct 04 '22 19:10

eadmaster


This is a Russian transliteration table:

  • http://en.wikipedia.org/wiki/Romanization%5Fof%5FRussian

If you have python installed, you can use this script:

  • http://snippets.dzone.com/posts/show/6395

It seems to do a pretty good job and also you can change it to your needs. It did not work out of the box for me, I had to remove the encode call from following two lines (line numbers given in front):

117 print fpath.encode('utf-8')
136 print 'Copying %s to %s' % (fpath.encode('utf-8'), new_fpath)

i.e. change to:

117 print fpath
136 print 'Copying %s to %s' % (fpath, new_fpath)

but then worked fine, example (assuming you put the script from the above with the changes given to the file in the same folder called transliterate.py and then chmod u+x transliterate.py to make it executable):

$ mkdir a
$ touch a/сказать
$ ./transliterate.py a
a/сказать
Copying a/сказать to a/skazat'

Hope this helps.

like image 20
icyrock.com Avatar answered Oct 04 '22 21:10

icyrock.com


Here is the improved python script, which renames file to latin and removes old ones, plus it replaces all spaces with underscore character.

https://gist.github.com/braman/a8504c15ca537ea49c6a

like image 33
Ramanqul Buzaubak Avatar answered Oct 04 '22 20:10

Ramanqul Buzaubak


David,

I'm not aware of any program that will do it automatically (although given the information below, I bet you could get a computer geek friend to do it for you in exchange for a pizza.) Actually, the program really wouldn't be that hard to write in Perl.

In any case, here is some information that would help you choose which letters to use for each Cyrillic, Slovene, and Greek letter.

http://en.wikipedia.org/wiki/Romanization_of_Russian

http://en.wikipedia.org/wiki/Slovene_alphabet

http://en.wikipedia.org/wiki/Romanization_of_Greek

Hope that helps a little bit!

like image 20
Aaron Johnson Avatar answered Oct 04 '22 21:10

Aaron Johnson