Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all special characters and case from string in bash

I am writing a bash script that needs to parse filenames.

It will need to remove all special characters (including space): "!?.-_ and change all uppercase letters to lowercase. Something like this:

Some_randoM data1-A
More Data0

to:

somerandomdata1a
moredata0

I have seen lots of questions to do this in many different programming languages, but not in bash. Is there a good way to do this?

like image 835
Questionmark Avatar asked May 22 '14 20:05

Questionmark


People also ask

How do I remove special characters from a string in Bash?

Remove Character from String Using trThe tr command (short for translate) is used to translate, squeeze, and delete characters from a string. You can also use tr to remove characters from a string. For demonstration purposes, we will use a sample string and then pipe it to the tr command.

How do you remove special and space characters from a string in Unix?

If you just want to remove certain characters I find the GNU version of tr easier to use, which supports a -d parameter to delete characters instead of translating them and also supports certain character classes. In this case just tr -d '[*][:space:]' might work well for you.

How do I remove unique characters from a string in Unix?

The first tr deletes special characters. d means delete, c means complement (invert the character set). So, -dc means delete all characters except those specified. The \n and \r are included to preserve linux or windows style newlines, which I assume you want.


4 Answers

cat yourfile.txt | tr -dc '[:alnum:]\n\r' | tr '[:upper:]' '[:lower:]'

The first tr deletes special characters. d means delete, c means complement (invert the character set). So, -dc means delete all characters except those specified. The \n and \r are included to preserve linux or windows style newlines, which I assume you want.

The second one translates uppercase characters to lowercase.

like image 125
Dan Bliss Avatar answered Oct 18 '22 19:10

Dan Bliss


Pure BASH 4+ solution:

$ filename='Some_randoM data1-A'
$ f=${filename//[^[:alnum:]]/}
$ echo "$f"
SomerandoMdata1A
$ echo "${f,,}"
somerandomdata1a

A function for this:

clean() {
    local a=${1//[^[:alnum:]]/}
    echo "${a,,}"
}

Try it:

$ clean "More Data0"
moredata0
like image 27
gniourf_gniourf Avatar answered Oct 18 '22 19:10

gniourf_gniourf


if you are using mkelement0 and Dan Bliss approach. You can also look into sed + POSIX regular expression.

cat yourfile.txt | sed 's/[^a-zA-Z0-9]//g'

Sed matches all other characters that are not contained within the brackets except letters and numbers and remove them.

like image 17
Unwastable Avatar answered Oct 18 '22 19:10

Unwastable


I've used tr to remove any characters that are not part of [:print:] class

cat file.txt | tr -dc '[:print:]'

or

echo "..." | tr -dc '[:print:]'

Additionally you might want to | (pipe) the output to od -c to confirm the result

cat file.txt | tr -dc '[:print:]' | od -c
like image 15
luka5z Avatar answered Oct 18 '22 19:10

luka5z