I am writing a bash script that needs to parse filenames.
It will need to remove all special characters (including space): "!?.-_ and change all uppercase letters to lowercase. Something like this:
Some_randoM data1-A
More Data0
to:
somerandomdata1a
moredata0
I have seen lots of questions to do this in many different programming languages, but not in bash. Is there a good way to do this?
Remove Character from String Using trThe tr command (short for translate) is used to translate, squeeze, and delete characters from a string. You can also use tr to remove characters from a string. For demonstration purposes, we will use a sample string and then pipe it to the tr command.
If you just want to remove certain characters I find the GNU version of tr easier to use, which supports a -d parameter to delete characters instead of translating them and also supports certain character classes. In this case just tr -d '[*][:space:]' might work well for you.
The first tr deletes special characters. d means delete, c means complement (invert the character set). So, -dc means delete all characters except those specified. The \n and \r are included to preserve linux or windows style newlines, which I assume you want.
cat yourfile.txt | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'
The first tr
deletes special characters. d
means delete, c
means complement (invert the character set). So, -dc
means delete all characters except those specified. The \n
and \r
are included to preserve linux or windows style newlines, which I assume you want.
The second one translates uppercase characters to lowercase.
Pure BASH 4+ solution:
$ filename='Some_randoM data1-A'
$ f=${filename//[^[:alnum:]]/}
$ echo "$f"
SomerandoMdata1A
$ echo "${f,,}"
somerandomdata1a
A function for this:
clean() {
local a=${1//[^[:alnum:]]/}
echo "${a,,}"
}
Try it:
$ clean "More Data0"
moredata0
if you are using mkelement0 and Dan Bliss approach. You can also look into sed + POSIX regular expression.
cat yourfile.txt | sed 's/[^a-zA-Z0-9]//g'
Sed matches all other characters that are not contained within the brackets except letters and numbers and remove them.
I've used tr
to remove any characters that are not part of [:print:]
class
cat file.txt | tr -dc '[:print:]'
or
echo "..." | tr -dc '[:print:]'
Additionally you might want to |
(pipe) the output to od -c
to confirm the result
cat file.txt | tr -dc '[:print:]' | od -c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With