I am using this sed command to strip documents of all their (for me) unnecessary characters.
sed 's/[^a-zA-Z]/ /g'
However after mining my data a bit I realized a pretty basic mistake:
not including '
cuts all my don't
s into don t
s, which sucks.
So i want to include '
in my regex. I'm still new to this kind of "coding" if I may call it that way, so excuse my newbie mistake or even better, explain it to me!
sed 's/[^a-zA-Z']/ /g'
this obviously doesn't work
sed 's/[^a-zA-Z\']/ /g'
however this doesn't either, I thought \
escapes the '
?
Good old double-quotes in action to protect the single quote without any need of escaping:
sed "s/[^a-zA-Z']/ /g" <<< "don't ... do this"
gives:
don't do this
EDIT: your code seems to replace non-letters by space, but your question states otherwise, so I'm giving you the other version, to remove all non-letters/spaces and multiple occurrences of spaces as well (2nd expression).
sed -e "s/[^ a-zA-Z']//g" -e 's/ \+/ /' <<< "don't ... do this"
result:
don't do this
EDIT2: alternate solution to be able to keep single quotes (courtesy of Sundeep):
`'s/[^ a-zA-Z\x27]//g'`
Note: I first tried to escape single quotes following the solutions tested here and none using single quotes worked for me (always prompting for a line continuation) so I came up with those alternatives.
You can also use tr -cd "'[:alnum:] "
$ echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:]"
$ somestring''''withoutspecialcharsexcept'
If you want the spaces:
echo "some string '*'@'#'%^ without special chars except '" | tr -cd "'[:alnum:] "
some string '''' without special chars except '
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With