Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pattern match and replace the string with if else loop

Tags:

python

r

sed

awk

I have a file containing multiple lines starting with "1ECLI H--- 12.345 .....". I want to remove a space between I and H and add R/S/T upon iteration of the H pattern. for eg. H810 if repeated in consecutive three lines, it should get added with a letter R, S (second iteration), T (third iteration). so it would be H810R. Any help will be appreciated.
text looks like below

1ECLI  H813   98   7.529   8.326   9.267
1ECLI  H813   99   7.427   8.470   9.251
1ECLI  C814  100   7.621   8.513   9.263
1ECLI  H814  101   7.607   8.617   9.289
1ECLI  H814  102   7.633   8.489   9.156
1ECLI  H814  103   7.721   8.509   9.305
1ECLI   C74  104   8.164   8.733  10.740
1ECLI  H74R  105   8.247   8.690  10.799

upon chage

1ECLI H813R   98   7.529   8.326   9.267
1ECLI H813S   99   7.427   8.470   9.251
1ECLI  C814  100   7.621   8.513   9.263
1ECLI H814R  101   7.607   8.617   9.289
1ECLI H814s  102   7.633   8.489   9.156
1ECLI H814T  103   7.721   8.509   9.305
1ECLI   C74  104   8.164   8.733  10.740
1ECLI  H74R  105   8.247   8.690  10.799

Thanks.

like image 731
amruta Avatar asked Oct 17 '22 03:10

amruta


2 Answers

If your Input_file is same as shown sample then could you please try following awk and let me know if this helps you.

awk '
BEGIN{
  val[1]="R";
  val[2]="S";
  val[3]="T"
}
$2 !~ /^H[0-9]+/ || i==3{
  i=""
}
$2 ~ /^H[0-9]+$/ && /^1ECLI/{
  $2=$2val[++i]
}
1
'   Input_file  > temp_file  && mv  temp_file   Input_file

Adding explanation also for answer too as follows.

awk '
BEGIN{                        ##Starting BEGIN section of awk here.
  val[1]="R";                 ##creating an array named val whose index is 1 and value is string R.
  val[2]="S";                 ##creating array val 2nd element here whose value is S.
  val[3]="T"                  ##creating array val 3rd element here whose value is T.
}
$2 !~ /^H[0-9]+/ || i==3{     ##Checking condition if 2nd field does not start from H and digits after that OR variable i value is equal to 3.
  i=""                        ##Then nullifying the value of variable i here.
}
$2 ~ /^H[0-9]+$/ && /^1ECLI/{ ##Checking here if 2nd field value is starts from H till all digits till end AND line starts from 1ECLI string then do following.
  $2=$2val[++i]               ##re-creating value of 2nd field by adding value of array val whose index is increasing value of variable i.
}
1                             ##Mentioning 1 here, which means it will print the current line.
' Input_file   > temp_file  && mv  temp_file   Input_file                 ##Mentioning Input_file name here.
like image 105
RavinderSingh13 Avatar answered Oct 19 '22 22:10

RavinderSingh13


Even below one can give desired output, if your real input file is same as what you have posted.

awk 'BEGIN{split("R,S,T",a,/,/)}f=$2~/^H[0-9]+$/{$2 = $2 a[++c]}!f{c=0}1' infile 

Explanation

  • split("R,S,T",a,/,/) - split string "R,S,T" by separator comma, and save in array a, so it becomes a[1] = R, a[2] = S, a[3] = T

  • f=$2~/^H[0-9]+$/ - f is variable, validate regexp $2 ~ /^H[0-9]+$/, which returns boolean status. if it returned true then variable f will be true, otherwise false

  • $2 = $2 a[++c] if above one was true, then modify second field, so second field will have existing value plus array a value, corresponding to the index (c), ++c is pre-increment variable

  • !f{c=0} if variable f is false then reset variable c, not consecutive.

  • 1 at the end does default operation that is print current/record/row, print $0. To know how awk works try, awk '1' infile, which will print all records/lines, whereas awk '0' infile prints nothing. Any number other than zero is true, which triggers the default behavior.

Test Results:

$ cat infile
1ECLI  H813   98   7.529   8.326   9.267
1ECLI  H813   99   7.427   8.470   9.251
1ECLI  C814  100   7.621   8.513   9.263
1ECLI  H814  101   7.607   8.617   9.289
1ECLI  H814  102   7.633   8.489   9.156
1ECLI  H814  103   7.721   8.509   9.305
1ECLI   C74  104   8.164   8.733  10.740
1ECLI  H74R  105   8.247   8.690  10.799

$ awk 'BEGIN{split("R,S,T",a,/,/)}f=$2~/^H[0-9]+$/{$2 = $2 a[++c]}!f{c=0}1' infile
1ECLI H813R 98 7.529 8.326 9.267
1ECLI H813S 99 7.427 8.470 9.251
1ECLI  C814  100   7.621   8.513   9.263
1ECLI H814R 101 7.607 8.617 9.289
1ECLI H814S 102 7.633 8.489 9.156
1ECLI H814T 103 7.721 8.509 9.305
1ECLI   C74  104   8.164   8.733  10.740
1ECLI  H74R  105   8.247   8.690  10.799

If you want better formatting like tab or some other char as field separator, then you may use below one, modify OFS variable

$ awk -v OFS="\t" 'BEGIN{split("R,S,T",a,/,/)}f=$2~/^H[0-9]+$/{$2 = $2 a[++c]}!f{c=0}{$1=$1}1'  infile
1ECLI   H813R   98  7.529   8.326   9.267
1ECLI   H813S   99  7.427   8.470   9.251
1ECLI   C814    100 7.621   8.513   9.263
1ECLI   H814R   101 7.607   8.617   9.289
1ECLI   H814S   102 7.633   8.489   9.156
1ECLI   H814T   103 7.721   8.509   9.305
1ECLI   C74     104 8.164   8.733   10.740
1ECLI   H74R    105 8.247   8.690   10.799
like image 45
Akshay Hegde Avatar answered Oct 19 '22 23:10

Akshay Hegde