How to join 2 csv files with a shell script?

Question

I'm trying to make a shell script that will combine two csv files in the following way:

I have two csv files, f1.csv and f2.csv. The format of f1.csv is:

startId, endId, roomNum

f2.csv has a format like this:

startId, endId, teacherId

I want to combine these two into one csv file with this format:

startId, endId, roomNum, teacherId.

What is the best way to accomplish this with a shell script that runs under Linux?

dogbane · Accepted Answer

Try:

join -t, -1 1 -2 1 -o 1.2 1.3 1.4 2.4 <(awk -F, '{print $1":"$2","$0}' f1.csv | sort) <(awk -F, '{print $1":"$2","$0}' f2.csv | sort)

How it works:

1) I first create a composite key column, by joining the startId and endId into startId:endId for both files.

awk -F, '{print $1":"$2","$0}' f1.csv
awk -F, '{print $1":"$2","$0}' f2.csv

2) I sort both outputs:

awk -F, '{print $1":"$2","$0}' f1.csv | sort 
awk -F, '{print $1":"$2","$0}' f2.csv | sort

3) I then use the join command to join on my composite key (in the first column) and output just the columns I need.

Donate For Us