I am new to bash scripting. I want to split the following string,
chr14:81370042-81371098(+)
into
chr14 81370042 81371098 +
or
chr14:81370042-81371098(-)
into
chr14 81370042 81371098 -
Please tell me how do I do this with one command that would work for both the cases.
I tried, cat a.tsv | tr -s ':' '\t' | sed "s/\t[0-9]+.*[0-9]+/[0-9]+\t[0-9]/g", it does not work.
Thanks.
This sed may work:
sed -E 's/[^-+_[:alnum:]]+/ /g; s/ +$//; s/-(.)/ \1/g' <<< 'chr14:81370042-81371098(+)'
chr14 81370042 81371098 +
Or else:
sed -E 's/[^-+_[:alnum:]]+/ /g; s/ +$//; s/-(.)/ \1/g' <<< 'chr14:81370042-81371098(-)'
chr14 81370042 81371098 -
[^-+_[:alnum:]]+ matches 1 or more of any character that is not -, +, _ and alphanumeric.
Written and tested with shown samples, could you please try following.
echo "chr14:81370042-81371098(+)" | awk '{gsub(/:|-|\(|\)/,OFS)} 1'
2nd solution: Using field separator in awk.
echo "chr14:81370042-81371098(+)" |
awk -v FS=':|-|\(|\)' '{
$1=$1
sub(/ +$/,"")
}
1'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With