i mean getting rid of special chars in filenames, etc.
i have made a script, that can recursively rename files [http://pastebin.com/raw.php?i=kXeHbDQw]:
e.g.: before:
THIS i.s my file (1).txt
after running the script:
This-i-s-my-file-1.txt
Ok. here it is:
But: when i wanted to test it "fully", with filenames like this:
¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÂÃÄÅÆÇÈÊËÌÎÏÐÑÒÔÕ×ØÙUÛUÝÞßàâãäåæçèêëìîïðñòôõ÷øùûýþÿ.txt
áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&'()*+,:;<=>?@[\]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£.txt
it fails [http://pastebin.com/raw.php?i=iu8Pwrnr]:
$ sh renamer.sh directorythathasthefiles
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ¡¢£': No such file or directory
mv: cannot stat `./áíüűúöőóéÁÍÜŰÚÖŐÓÉ!"#$%&\'()*+,:;<=>?@[]^_`{|}~€‚ƒ„…†....and so on
$
so "mv" can't handle special chars.. :\
i worked on it for many hours..
does anyone has a working one? [that can handle chars [filenames] in that 2 lines too?]
Try something like:
find . -print0 -type f | awk 'BEGIN {RS="\x00"} { printf "%s\x00", $0; gsub("[^[:alnum:]]", "-"); printf "%s\0", $0 }' | xargs -0 -L 2 mv
Use of xargs(1) will ensure that each filename passed exactly as one parameter. awk(1) is used to add new filename right after old one.
One more trick: sed -e 's/-+/-/g' will replace groups of more than one "-" with exactly one.
mv
handles special characters just fine. Your script doesn't.
In no particular order:
You are using find
to find all directories, and ls
each directory separately.
Why use for DEPTH in...
if you can do exactly the same with one command?
find -maxdepth 100 -type d
Which makes the arbitrary depth limit unnecessary
find -type d
Don't ever parse the output of ls
, especially if you can let find
handle that, too
find -not -type d
Make sure it works in the worst possible case:
find -not -type d -print0 | while read -r -d '' FILENAME; do
This stops read
from eating certain escapes and choking on filenames with new-line characters.
You are repeating the entire ls | replace
cycle for every single character. Don't - it kills performance. Loop over each directory all files once, and just use multiple sed
's, or multiple replacements in one sed
command.
sed 's/á/a/g; s/í/i/g; ...'
(I was going to suggest sed 'y/áí/ai/'
, but unfortunately that doesn't seem to work with Unicode. Perhaps perl -CS -Mutf8 -pe 'y/áí/ai/'
would.)
You're still thinking in ASCII: "other special chars - ASCII Codes 33.. ..255". Don't.
These days, most systems use Unicode in UTF-8 encoding, which has a much wider range of "special" characters - so big that listing them out one by one becomes pointless. (It is even multibyte - "e" is one byte, "ė" is three bytes.)
True ASCII has 128 characters. What you currently have in mind are the ISO 8859 character sets (sometimes called "ANSI") - in particular, ISO 8859-1. But they go all the way up to 8859-16, and only the "ASCII" part stays the same.
echo -n $(command)
is rather useless.
There are much easier ways to find the directory and basename given a path. For example, you can do
directory=$(dirname "$path")
oldnname=$(basename "$path")
# filter $oldname
mv "$path" "$directory/$newname"
Do not use egrep
to check for errors. Check the program's return code. (Like you already do with cd
.)
And instead of filtering out other errors, do...
if [[ -e $directory/$newname ]]; then
echo "target already exists, skipping: $oldname -> $newname"
continue
else
mv "$path" "$directory/$newname"
fi
The ton of sed 's/------------/-/g'
calls can be changed to a single regexp:
sed -r 's/-{2,}/-/g'
The [ ]
s in tr [foo] [bar]
are unnecessary. They just cause tr
to replace [
to [
, and ]
to ]
.
Seriously?
echo "$FOLDERNAME" | sed "s/$/\//g"
How about this instead?
echo "$FOLDERNAME/"
And finally, use detox
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With