Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed with regular expression

Tags:

regex

sed

I am trying to replace three letter code at the end of a sequence with nothing (basically removing) with sed but is not working well for multiple regex pattern. Here is an example of sequences

GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTGA

When I try to use regex individually with sed it works

echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA" | sed 's/TAA$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'

However when I try to include multiple regex it doesn't work

echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" |
sed 's/(TAG$|TAA$|TGA$)//'

Could somebody point to me where I am doing wrong?

like image 742
upendra Avatar asked Oct 23 '25 19:10

upendra


2 Answers

You need to use extended regex switch in sed:

sed -r 's/(TAG|TAA|TGA)$//'

OR on OSX:

sed -E 's/(TAG|TAA|TGA)$//'

Or this sed without extended regex (doesn't work on OSX though):

sed 's/\(TAG\|TAA\|TGA\)$//'
like image 131
anubhava Avatar answered Oct 26 '25 09:10

anubhava


You need to escape the RE metacharacters | and parens.

sed 's/\(TAG$\|TAA$\|TGA$\)//'

or you can use the portable option -E to prevent escaping. -E enable extended regular expressions, so your original command will run without any issues.

like image 32
jaypal singh Avatar answered Oct 26 '25 09:10

jaypal singh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!