Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine the data from two CSV files in BASH?

I have two CSV files which use @ to divide each column. The first file (file1.csv) has two columns:

cat @ eats fish
spider @ eats insects

The second file (file2.csv) has four columns:

info @ cat @ info @ info
info @ spider @ info @ info
info @ rabbit @ info @ info

I need to add the information from the second column of the first file to a new column in the second file, in cases where the details of the first column of the first file and the second column of the second file match, e.g., the result of the above would make this:

info @ cat @ info @ info @ eats fish
info @ spider @ info @ info @ eats insects
info @ rabbit @ info @ info @

As seen above, as the first file contained no information about rabbits, a new empty column is added to the last row of the second file.

Here is what I know how to do so far:

while read line can be used to cycle through the rows in the second file, e.g.:

while read line
do
    (commands)
done < file2.csv

The data from particular columns can be accessed with awk -F "@*" '{print $n}', where n is the column number.

while read line
do
    columntwo=$(echo $line | awk -F "@*" '{print $2})
    while read line
    do
        columnone=$(echo $line | awk -F "@*" '{print $1})
        if [ “$columnone” == “$columntwo” ]
        then
            (commands)
        fi
    done < file1.csv
done < file2.csv

My approach seems inefficient and I am not sure how to use add the data from the second column of file1.csv1 to a new column in file2.csv.

  • Items in column 1 of file1.csv1 and column 2 of file2.csv are unique to those files. There are no duplicate entries within those files.
  • The resulting file should have exactly 5 columns in every line, even if some columns are empty.
  • The file contains a lot of characters from various languages in UTF-8.
  • There is white space around @, but if this causes problems with the script, I can delete this.

How can the data from the first file be added to the data in the second file?

like image 941
Village Avatar asked Apr 06 '12 01:04

Village


1 Answers

And a nice, clean awk solution:

awk -F" *@ *" 'NR==FNR{lines[$2]=$0} NR!=FNR{if(lines[$1])lines[$1]=lines[$1] " @ " $2} END{for(line in lines)print lines[line]}' file2.csv file1.csv

A nice one-liner. Not a short one, but not the longest I've seen. Note that file2 and file1 are switched. Again, as a script with explanation:

#!/usr/bin/awk -f

# Split fields on @ and the whitespace on either side.
BEGIN { FS = " *@ *" }

# First file
NR == FNR {
    #Store the line
    lines[$2] = $0
}

# Second file
NR != FNR {
    # If the appropriate animal was in the first file, append its eating habits.
    # If not, it's discarded; if you want something else, let me know.
    if(lines[$1]) lines[$1] = lines[$1] " @ " $2
}

# After both files have been processed
END {
    # Loop over all lines in the first file and print them, possibly updated with eating habits.
    # No guarantees on order.
    for(line in lines) print lines[line]
}

Call as awk -f join.awk file2.csv file1.csv, or make executable and ./join.awk file2.csv file1.csv.

like image 55
Kevin Avatar answered Sep 27 '22 15:09

Kevin