Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed to remove all characters except letters and '

Tags:

bash

sed

I am using this sed command to strip documents of all their (for me) unnecessary characters.

sed 's/[^a-zA-Z]/ /g'

However after mining my data a bit I realized a pretty basic mistake: not including ' cuts all my don'ts into don ts, which sucks.

So i want to include ' in my regex. I'm still new to this kind of "coding" if I may call it that way, so excuse my newbie mistake or even better, explain it to me!

sed 's/[^a-zA-Z']/ /g' this obviously doesn't work

sed 's/[^a-zA-Z\']/ /g' however this doesn't either, I thought \ escapes the '?

like image 310
Jakob Avatar asked Nov 14 '16 11:11

Jakob


2 Answers

Good old double-quotes in action to protect the single quote without any need of escaping:

sed "s/[^a-zA-Z']/ /g" <<< "don't ... do this"

gives:

don't     do this

EDIT: your code seems to replace non-letters by space, but your question states otherwise, so I'm giving you the other version, to remove all non-letters/spaces and multiple occurrences of spaces as well (2nd expression).

sed -e "s/[^ a-zA-Z']//g" -e 's/ \+/ /' <<< "don't ... do this"

result:

don't do this

EDIT2: alternate solution to be able to keep single quotes (courtesy of Sundeep):

`'s/[^ a-zA-Z\x27]//g'`

Note: I first tried to escape single quotes following the solutions tested here and none using single quotes worked for me (always prompting for a line continuation) so I came up with those alternatives.

like image 61
Jean-François Fabre Avatar answered Oct 29 '22 22:10

Jean-François Fabre


You can also use tr -cd "'[:alnum:] "

$ echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:]"

$ somestring''''withoutspecialcharsexcept'

If you want the spaces:

echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:] "
some string '''' without special chars except '
like image 42
Joel Griffiths Avatar answered Oct 29 '22 22:10

Joel Griffiths