Picking up CSV fields by name using awk

Question

Suppose I have a CSV file with headers of the following form:

Field1,Field2
3,262000
4,449000
5,650000
6,853000
7,1061000
8,1263000
9,1473000
10,1683000
11,1893000

I would like to write an awk script which will take a comma-separated list of field names target, split it into an array, and then only pick out those columns with the names I specify.

This is what I have tried so far, and I have verified that the head array contains the desired headers, and the targets array contains the desired targets passed in by the given command line.

BEGIN{
    FS=","
    split(target, targets, ",")

}

NR==1 {
    for (i = 1; i <= NF; i++) head[i] = $i
}

NR !=1{
    for (i = 1; i <= NF; i++) {
        if (head[i] in targets){
            print $i
        }
    }
}

When I invoke this script with the command

awk -v target=Field1 -f GetCol.awk Debug.csv

I get nothing printed out.

merlin2011 · Accepted Answer

I figured it out and am posting the answer in case others run into the same problem.

It has to do with the in keyword I am using for testing array membership. This keyword only tests whether the operand on the left is one of the indices in the array on the right, rather than on of the values. The fix is to create a reverse-lookup array, as follows.

BEGIN{
    OFS=FS=","
    split(target, t_targets, ",")
    for (i in t_targets)
        targets[t_targets[i]] = i
}

Chris Seymour · Answer

My two cents:

BEGIN{
    OFS=FS=","
    split(target,fields,FS)            # We just set FS don't hard the comma here
    for (i in fields)                  # Distinct var name to aviod headaches
        field_idx[fields[i]] = i       # Reverse lookup 
}
NR==1 {                                # Process header
    for (i=1;i<=NF;i++)                # For each field header
        head[i] = $i                   # Add to hash for comparision with target
    next                               # Skip to next line
}
{                                      # Don't need invert condition (used next)
    sep=""                             # Set for leading separator
    for (i=1;i<=NF;i++)                # For each field
        if (head[i] in field_idx) {    # Test for current field is a target field
            printf "%s%s",sep,$i       # Print the column if matched 
            sep=OFS                    # Set separator to OFS                  
    }
    printf "
"                        # Print newline character
}

Rhys Ulerich · Answer

An extension of @sudo_O's solution (thank you) that

outputs fields from standard input based on the command line arguments,
outputs the fields in the order requested (possibly multiple times),
outputs a placeholder when a field was requested but not found, and
warns on standard error about duplicate field names in the header.

#!/usr/bin/awk -f
# Process standard input outputting named columns provided as arguments.
#
# For example, given foo.dat containing
#     a b c c
#     1a 1b 1c 1C
#     2a 2b 2c 2C
#     3a 3b 3c 3C
# Running
#   cat foo.dat | ./namedcols c b a a d
# will output
#   1c 1b 1a 1a d
#   2c 2b 2a 2a d
#   3c 3b 3a 3a d
# and will warn on standard error that it
#   Ignored duplicate 'c' in column 4
# Notice that the requested but missing column d contains "d".
#
# Using awk's -F feature it is possible to parse comma-separated data:
#   cat foo.csv | ./namedcols -F, c b a a d
BEGIN {
    for (i=1; i<ARGC; ++i)
        desired[i] = ARGV[i]
    delete ARGV
}
NR==1 {
    for (i=1; i<=NF; i++)
        if ($i in names)
            printf "Ignored duplicate '%s' in column %d
", $i, i | "cat 1>&2"
        else
            names[$i] = i
    next
}
{
    for (i=1; i<ARGC; ++i)
        printf "%s%s",                                          \
               (i==1 ? "" : OFS),                               \
               ((ndx = names[name = desired[i]])>0 ? $ndx: name)
    printf RS
}

Picking up CSV fields by name using awk

Tags:

csv

awk

gawk

merlin2011

3 Answers

merlin2011

Chris Seymour

Rhys Ulerich

Recent Activity

Donate For Us

Picking up CSV fields by name using awk

Tags:

csv

awk

gawk

merlin2011

3 Answers

merlin2011

Chris Seymour

Rhys Ulerich

Related questions

Recent Activity

Donate For Us