Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping duplicated fields with awk

Tags:

awk

I have the following file:

ID|2018-04-29
ID|2018-04-29
ID|2018-04-29
ID1|2018-06-26
ID1|2018-06-26
ID1|2018-08-07
ID1|2018-08-22

and using awk, I want to add $3 that groups the duplicated IDs based on $1 and $2 so that the output would be

ID|2018-04-29|group1
ID|2018-04-29|group1
ID|2018-04-29|group1
ID1|2018-06-26|group2
ID1|2018-06-26|group2
ID1|2018-08-07|group3
ID1|2018-08-22|group4

I tried the following code but it does not give me the desired output. Also, I am not sure if I can apply it to a column with date in it.

awk -F"|" '{print $0,"group"++seen[$1,$3]}' OFS="|"

Any hints on how to achieve it using awk (one-liner, if possible) would be highly appreciated.

like image 490
DSTO Avatar asked Sep 20 '25 05:09

DSTO


1 Answers

With your shown samples, please try following awk code.

awk -v OFS="|" '!arr[$0]++{count++} {print $0,"group"count}' Input_file

Explanation: Adding detailed explanation for above.

awk '                     ##Starting awk program from here.
BEGIN{                    ##Starting BEGIN section of this program from here.
  OFS="|"                 ##Setting OFS to | here.
}
!arr[$0]++{               ##Checking if current line is NOT present in array then do following.
  count++                 ##Increasing count with 1 here.
}
{
  print $0,"group"count   ##Printing current line with group and count value here.
}
' Input_file              ##Mentioning Input_file name here.
like image 82
RavinderSingh13 Avatar answered Sep 22 '25 12:09

RavinderSingh13