Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to read from specific fields of a CSV file

Tags:

csv

awk

The code provided reads a CSV file and prints the count of all strings found in descending order. However, I would like to know how to specify what fields I would like to read in count...for example ./example-awk.awk 1,2 file.csv would read strings from fields 1 and 2 and print the counts

    #!/bin/awk -f

BEGIN {
    FIELDS = ARGV[1];
    delete ARGV[1];
    FS = ", *"
}

{
    for(i = 1; i <= NF; i++)
        if(FNR != 1)
        data[++data_index] = $i
}

END {
    produce_numbers(data)

    PROCINFO["sorted_in"] = "@val_num_desc"

    for(i in freq)
        printf "%s\t%d\n", i, freq[i]
}

function produce_numbers(sortedarray)
{
    n = asort(sortedarray)

    for(i = 1 ; i <= n; i++)
    {
        freq[sortedarray[i]]++
    }
    return
}

This is currently the code I am working with, ARGV[1] will of course be the specified fields. I am unsure how to go about storing this value to use it.

For example ./example-awk.awk 1,2 simple.csv with simple.csv containing

A,B,C,A
B,D,C,A
C,D,A,B
D,C,A,A

Should result in

D    3
C    2
B    2
A    1

Because it only counts strings in fields 1 and 2

like image 818
Just Another Coder Avatar asked Dec 23 '22 16:12

Just Another Coder


1 Answers

EDIT(as per OP's request): As per OP he/she needs to have solution using ARGV so adding solution as per that now (NOTE: cat script.awk is only written to show content of actual awk script only).

cat script.awk
BEGIN{
  FS=","
  OFS="\t"
  for(i=1;i<(ARGC-1);i++){
     arr[ARGV[i]]
     delete ARGV[i]
  }
}   
{
  for(i in arr){ value[$i]++ }
}
END{
  PROCINFO["sorted_in"] = "@ind_str_desc"
  for(j in value){
     print j,value[j]
  }
}

Now when we run it as follows:

awk -f script.awk 1 2 Input_file
D       3
C       2
B       2
A       1


My original solution: Could you please try following, written and tested with shown samples. It is a generic solution where awk program has a variable named fields where you could mention all field numbers which you want to deal with using ,(comma) separator in it.

awk -v fields="1,2" '
BEGIN{
  FS=","
  OFS="\t"
  num=split(fields,arr,",")
  for(i=1;i<=num;i++){
    key[arr[i]]
  }
}
{
for(i in key){
  value[$i]++
 }
}
END{
  for(i in value){
    print i,value[i]
  }
}' Input_file | sort -rk1

Output will be as follows.

D       3
C       2
B       2
A       1
like image 118
RavinderSingh13 Avatar answered Dec 25 '22 05:12

RavinderSingh13