Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add additional fields based on field count

Tags:

sed

awk

I have data in below format in a file

"123","XYZ","M","N","P,Q"
"345",
"987","MNO","A,B,C"

I always want to have 5 entries in the row , so if the count of fields in 2 then 3 extra ("") needs to be added.

"123","XYZ","M","N","P,Q" 
"345","","","",""  
"987","MNO","A,B,C","",""  

I looked upto the solution on the page

Add Extra Strings Based on count of fields- Sed/Awk

which has very similar requirement but when I try it fails as I have comma (,) within the field also.

Thanks.

like image 875
Anand Abhay Avatar asked Jun 17 '21 10:06

Anand Abhay


3 Answers

In GNU awk with your shown samples, please try following code.

awk -v s1="\"" -v FPAT='[^,]*|"[^"]+"' '
BEGIN{ OFS="," }
FNR==NR{
  nof=(NF>nof?NF:nof)
  next
}
NF<nof{
  val=""
  i=($0~/,$/?NF:NF+1)
  for(;i<=nof;i++){
    val=(val?val OFS:"")s1 s1
  }
  sub(/,$/,"")
  $0=$0 OFS val
}
1
'  Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk -v s1="\"" -v FPAT='[^,]*|"[^"]+"' ' ##Starting awk program from here setting FPAT to csv file parsing here.
BEGIN{ OFS="," }                         ##Starting BEGIN section of this program setting OFS to comma here.
FNR==NR{                                 ##Checking condition FNR==NR here, which will be true for first time file reading.
  nof=(NF>nof?NF:nof)                    ##Create nof to get highest NF value here.
  next                                   ##next will skip all further statements from here.
}
NF<nof{                                  ##checking if NF is lesser than nof then do following.
  val=""                                 ##Nullify val here.
  i=($0~/,$/?NF:NF+1)                    ##Setting value of i as per condition here.
  for(;i<=nof;i++){                      ##Running loop till value of nof matches i here.
    val=(val?val OFS:"")s1 s1            ##Creating val which has value of "" in it.
  }
  sub(/,$/,"")                           ##Removing ending , here.
  $0=$0 OFS val                          ##Concatinate val here.
}
1                                        ##Printing current line here.
'  Input_file  Input_file                ##Mentioning Input_file names here.


EDIT: Adding this code here, where keeping a variable named nof where we can give our number of fields value which should be added minimum in all missing lines, in case any line is having more than minimum field values then it will take that value to add those many number of fields in missing field line.

awk -v s1="\"" -v nof="5" -v FPAT='[^,]*|"[^"]+"' '
BEGIN{ OFS="," }
FNR==NR{
  nof=(NF>nof?NF:nof)
  next
}
NF<nof{
  val=""
  i=($0~/,$/?NF:NF+1)
  for(;i<=nof;i++){
    val=(val?val OFS:"")s1 s1
  }
  sub(/,$/,"")
  $0=$0 OFS val
}
1
'  Input_file  Input_file
like image 77
RavinderSingh13 Avatar answered Oct 24 '22 06:10

RavinderSingh13


Here is one for GNU awk using FPAT when [you] always want to have 5 entries in the row :

$ awk '
BEGIN {
    FPAT="([^,]*)|(\"[^\"]+\")"
    OFS=","
}
{
NF=5                              # set NF to limit too long records
for(i=1;i<=NF;i++)                # iterate to NF and set empties to ""
    if($i=="")
        $i="\"\""
}1' file

Output:

"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""
like image 6
James Brown Avatar answered Oct 24 '22 08:10

James Brown


Here is a an awk command that would work with any version of awk:

awk -v n=5 -v ef=',""' -F '","' '
{
   sub(/,+$/, "")
   for (i=NF; i<n; ++i)
      $0 = $0 ef
} 1' file

"123","XYZ","M","N","P,Q"
"345","","","",""
"987","MNO","A,B,C","",""
like image 6
anubhava Avatar answered Oct 24 '22 06:10

anubhava