I have a TAB file something like:
V I 280 6 - VRSSAI
N V 2739 7 - SAVNATA
A R 203 5 - AEERR
Q A 2517 7 - AQSTPSP
S S 1012 5 - GGGSS
L A 281 11 - AAEPALSAGSL
And I would like to check the last column respect to the order of letters in 1st and 2nd column. If are coincidences between the first and last letter in last column comparing to the 1st and 2nd column respectively remain identical. On the contrary if there are not coincidences I would like to locate the reverse order pattern in last column and then print the string from the letter in 1st column to the end and then take the first letter and print to the letter in 2nd column. The desired output would be:
V I 280 6 - VRSSAI
N V 2739 7 - NATASAV
A R 203 5 - AEERR
Q A 2517 7 - QSTPSPA
S S 1012 5 - SGGGS
L A 281 11 - LSAGSLAAEPA
In this way I'm try to do different scripts but do not work correctly I don't know exactly why.
awk 'BEGIN {FS=OFS="\t"}{gsub(/$2$1/,"\t",$6); print $1$7$6$2}' "input" > "output";
Other way is:
awk 'BEGIN {FS=OFS="\t"} {len=split($11,arrseq,"$7$6"); for(i=0;i<len;i++){printf "%s ",arrseq[i],arrseq[i+1]}' `"input" > "output";`
And I try by means of substr function too but finally no one works correctly. Is it possible to do in bash? Thanks in advance
I try to put an example in order to understand better the question.
$1 $2 $6
L A AAEPALSAGSL (reverse pattern 'AL' $2$1)
desired output in $6 from the corresponding $2 letter within reverse pattern to the end following by first letter to corresponding $1 letter within the reverse pattern
$1 $2 $6
L A LSAGSLAAEPA
If I understood the question correctly, this awk
should do it:
awk '( substr($6, 1, 1) != $1 || substr($6, length($6), 1) != $2 ) && i = index($6, $2$1) { $6 = substr($6, i+1) substr($6, 1, i) }1' OFS=$'\t' data
You basically want to rotate the string so that the beginning of the string matches the char in $1
and the end of the string matches the char in $2
. Strings that cannot be rotated to match that condition are left unchanged, for example:
A B 3 3 - BCAAB
You can try this awk, it's not perfect but it give you a starting point.
awk '{i=(match($6,$1));if(i==1)print;else{a=$6;b=substr(a,i);c=substr(a,1,(i-1));$6=b c;print}}' OFS='\t' infile
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With