Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a string depends on a pattern in other column (UNIX environment)

I have a TAB file something like:

V    I      280     6   -   VRSSAI
N    V      2739    7   -   SAVNATA
A    R      203     5   -   AEERR
Q    A      2517    7   -   AQSTPSP
S    S      1012    5   -   GGGSS
L    A      281    11   -   AAEPALSAGSL

And I would like to check the last column respect to the order of letters in 1st and 2nd column. If are coincidences between the first and last letter in last column comparing to the 1st and 2nd column respectively remain identical. On the contrary if there are not coincidences I would like to locate the reverse order pattern in last column and then print the string from the letter in 1st column to the end and then take the first letter and print to the letter in 2nd column. The desired output would be:

V    I      280     6   -   VRSSAI
N    V      2739    7   -   NATASAV
A    R      203     5   -   AEERR
Q    A      2517    7   -   QSTPSPA
S    S      1012    5   -   SGGGS
L    A      281    11   -   LSAGSLAAEPA

In this way I'm try to do different scripts but do not work correctly I don't know exactly why.

awk 'BEGIN {FS=OFS="\t"}{gsub(/$2$1/,"\t",$6); print $1$7$6$2}' "input" > "output";

Other way is:

awk 'BEGIN {FS=OFS="\t"} {len=split($11,arrseq,"$7$6"); for(i=0;i<len;i++){printf "%s ",arrseq[i],arrseq[i+1]}' `"input" > "output";`

And I try by means of substr function too but finally no one works correctly. Is it possible to do in bash? Thanks in advance

I try to put an example in order to understand better the question.

$1                 $2                 $6
L                  A                  AAEPALSAGSL (reverse pattern 'AL' $2$1)

desired output in $6 from the corresponding $2 letter within reverse pattern to the end following by first letter to corresponding $1 letter within the reverse pattern

$1                 $2                 $6
L                  A                  LSAGSLAAEPA
like image 316
Perceval Vellosillo Gonzalez Avatar asked Dec 27 '17 17:12

Perceval Vellosillo Gonzalez


2 Answers

If I understood the question correctly, this awk should do it:

awk '( substr($6, 1, 1) != $1 || substr($6, length($6), 1) != $2 ) && i = index($6, $2$1) { $6 = substr($6, i+1) substr($6, 1, i)  }1' OFS=$'\t' data

You basically want to rotate the string so that the beginning of the string matches the char in $1 and the end of the string matches the char in $2. Strings that cannot be rotated to match that condition are left unchanged, for example:

A    B    3    3    -    BCAAB
like image 73
PesaThe Avatar answered Nov 17 '22 02:11

PesaThe


You can try this awk, it's not perfect but it give you a starting point.

awk '{i=(match($6,$1));if(i==1)print;else{a=$6;b=substr(a,i);c=substr(a,1,(i-1));$6=b c;print}}' OFS='\t' infile
like image 2
ctac_ Avatar answered Nov 17 '22 00:11

ctac_