Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge duplicate lines into same row with primary key and more than one column of information

Tags:

awk

Here is my data:

NAME1,NAME1_001,NULL,LIC100_1,NULL,LIC300-3,LIC300-6
NAME1,NAME1_003,LIC000_1,NULL,NULL,NULL,NULL
NAME2,NAME2_001,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_001,NULL,LIC400_2,NULL,NULL,LIC500_1
NAME3,NAME3_005,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_006,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME4,NAME4_002,NULL,LIC100_1,NULL,LIC300-3,LIC300-6

Expected result:

NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6

I tried below command, but have no idea how to add the details ($3 to $7)

awk '
    BEGIN{FS=","; OFS="|"}; 
    { arr[$1] = arr[$1] == ""? $2 : arr[$1] "|" $2 }   
    END {for (i in arr) print i, arr[i] }' file.csv

Any suggestion? thanks!!

like image 767
wilssssssslam Avatar asked Feb 02 '21 06:02

wilssssssslam


People also ask

How do I combine two rows of data on the same line in Excel?

Select the range of cells where you want to merge rows. Go to the Ablebits Data tab > Merge group, click the Merge Cells arrow, and then click Merge Rows into One.


3 Answers

Could you please try following. Written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS=","
  OFS="|"
}
FNR==NR{
  first=$1
  $1=""
  sub(/^,/,"")
  arr[first]=(first in arr?arr[first] OFS:"")$0
  next
}
($1 in arr){
  print $1 arr[$1]
  delete arr[$1]
}
' Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk '                       ##Starting awk program from here.
BEGIN{                      ##Starting BEGIN section of this program from here.
  FS=","                    ##Setting FS as comma here.
  OFS="|"                   ##Setting OFS as | here.
}
FNR==NR{                    ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
  first=$1                  ##Setting first as 1st field here.
  $1=""                     ##Nullifying first field here.
  sub(/^,/,"")              ##Substituting starting comma with NULL in current line.
  arr[first]=(first in arr?arr[first] OFS:"")$0  ##Creating arr with index of first and keep adding same index value to it.
  next                      ##next will skip all further statements from here.
}
($1 in arr){                ##Checking condition if 1st field is present in arr then do following.
  print $1 arr[$1]          ##Printing 1st field with arr value here.
  delete arr[$1]            ##Deleting arr item here.
}
' Input_file  Input_file    ##Mentioning Input_file names here.
like image 54
RavinderSingh13 Avatar answered Dec 14 '22 11:12

RavinderSingh13


Another awk:

$ awk '
BEGIN {               # set them field separators
    FS=","
    OFS="|"
}
{
    if($1 in a) {     # if $1 already has an entry in a hash
        t=$1          # store key temporarily
        $1=a[$1]      # set the a hash entry to $1
        a[t]=$0       # and hash the record
    } else {          # if $1 seen for the first time
        $1=$1         # rebuild record to change the separators
        a[$1]=$0      # and hash the record
    }
}
END {                 # afterwards
    for(i in a)       # iterate a 
        print a[i]    # and output
}' file
like image 39
James Brown Avatar answered Dec 14 '22 11:12

James Brown


Assuming your input is grouped by the key field as shown in your example (if it isn't then sort it first) you don't need to store the whole file in memory or read it twice and this will output the lines in the same order they appear in the input:

$ cat tst.awk
BEGIN { FS=","; OFS="|" }
$1 != prev {
    if (NR>1) {
        print rec
    }
    prev = rec = $1
}
{
    $1 = ""
    rec = rec $0
}
END { print rec }

$ awk -f tst.awk file
NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6
like image 22
Ed Morton Avatar answered Dec 14 '22 10:12

Ed Morton