How to merge duplicate lines into same row with primary key and more than one column of information

Tags:

awk

Here is my data:

NAME1,NAME1_001,NULL,LIC100_1,NULL,LIC300-3,LIC300-6
NAME1,NAME1_003,LIC000_1,NULL,NULL,NULL,NULL
NAME2,NAME2_001,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_001,NULL,LIC400_2,NULL,NULL,LIC500_1
NAME3,NAME3_005,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME3,NAME3_006,LIC000_1,NULL,LIC400_2,NULL,NULL
NAME4,NAME4_002,NULL,LIC100_1,NULL,LIC300-3,LIC300-6

Expected result:

NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6

I tried below command, but have no idea how to add the details ($3 to $7)

awk '
    BEGIN{FS=","; OFS="|"}; 
    { arr[$1] = arr[$1] == ""? $2 : arr[$1] "|" $2 }   
    END {for (i in arr) print i, arr[i] }' file.csv

Any suggestion? thanks!!

767

asked Feb 02 '21 06:02

3 Answers

Could you please try following. Written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS=","
  OFS="|"
}
FNR==NR{
  first=$1
  $1=""
  sub(/^,/,"")
  arr[first]=(first in arr?arr[first] OFS:"")$0
  next
}
($1 in arr){
  print $1 arr[$1]
  delete arr[$1]
}
' Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk '                       ##Starting awk program from here.
BEGIN{                      ##Starting BEGIN section of this program from here.
  FS=","                    ##Setting FS as comma here.
  OFS="|"                   ##Setting OFS as | here.
}
FNR==NR{                    ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
  first=$1                  ##Setting first as 1st field here.
  $1=""                     ##Nullifying first field here.
  sub(/^,/,"")              ##Substituting starting comma with NULL in current line.
  arr[first]=(first in arr?arr[first] OFS:"")$0  ##Creating arr with index of first and keep adding same index value to it.
  next                      ##next will skip all further statements from here.
}
($1 in arr){                ##Checking condition if 1st field is present in arr then do following.
  print $1 arr[$1]          ##Printing 1st field with arr value here.
  delete arr[$1]            ##Deleting arr item here.
}
' Input_file  Input_file    ##Mentioning Input_file names here.

answered Dec 14 '22 11:12

Assuming your input is grouped by the key field as shown in your example (if it isn't then sort it first) you don't need to store the whole file in memory or read it twice and this will output the lines in the same order they appear in the input:

$ cat tst.awk
BEGIN { FS=","; OFS="|" }
$1 != prev {
    if (NR>1) {
        print rec
    }
    prev = rec = $1
}
{
    $1 = ""
    rec = rec $0
}
END { print rec }

$ awk -f tst.awk file
NAME1|NAME1_001|NULL|LIC100_1|NULL|LIC300-3|LIC300-6|NAME1_003|LIC000_1|NULL|NULL|NULL|NULL
NAME2|NAME2_001|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME3|NAME3_001|NULL|LIC400_2|NULL|NULL|LIC500_1|NAME3_005|LIC000_1|NULL|LIC400_2|NULL|NULL|NAME3_006|LIC000_1|NULL|LIC400_2|NULL|NULL
NAME4|NAME4_002|NULL|LIC100_1|NULL|LIC300-3|LIC300-6

answered Dec 14 '22 10:12

Ed Morton

Related questions
                            
                                How to use multiple delimiters in awk?
                            
                                Awk? Append an incremental number to each line containing a symbol
                            
                                Print only words with Capital Letters (Linux)
                            
                                How can I remove a string after a specific character ONLY in a column/field in awk or bash?
                            
                                Extract specific words from a line
                            
                                How can I change a certain field of a file into upper-case using awk?
                            
                                Want an AWK that implements '@include'
                            
                                Remove all spaces in lines but not between double quotes
                            
                                Converting a list to double quoted comma separated strings
                            
                                Should I always use GAWK over AWK?
                            
                                Check if two lines start with the same character, if so the output average, if not, print actual value
                            
                                sed gives error with unterminated substitute in regular expression
                            
                                AWK Error: Attempt to use array in a scalar context
                            
                                Meaning of this shell script line with awk
                            
                                Shellscript Read XML attribute value
                            
                                Search and replace string in a very big file
                            
                                Check if a process is running and if not, restart it using Cron
                            
                                How do I know which delimiter has occurred first using awk in bash?
                            
                                How to right pad a field with spaces using AWK
                            
                                Unix/Perl/Python: substitute list on big data set

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to merge duplicate lines into same row with primary key and more than one column of information

Tags:

awk

wilssssssslam

People also ask

3 Answers

RavinderSingh13

James Brown

Ed Morton

Recent Activity

Donate For Us