Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BASH: Joining 2 CSV files based on common field name

Tags:

grep

bash

join

csv

awk

I have 2 CSV files and I need to JOIN them using BASH:

file_1.csv columns: 

track_id    
title
song_id 
release 
artist_id   
artist_mbid 
artist_name 
duration    
artist_familiarity  
artist_hotttnesss
year

Sample date in file_1.csv

TRZZZZZ12903D05E3A,Infra Stellar,SOZPUEF12AF72A9F2A,Archives Vol. 2,ARBG8621187FB54842,4279aba0-1bde-40a9-8fb2-c63d165dc554,Delerium,495.22893,0.69652442519,0.498471038842,2001

file_2.csv columns: 

track_id    
sales_date  
sales_count

Sample data in file_2.csv

TRZZZZZ12903D05E3A,2014-06-19,79

The relation between the files is that file_1.track_id = file_2.track_id.

I want to create a 3rd file file_3.csv that will have the following columns:

file_2.track_id,file_2.sales_date,file_2.sales_count,file_1.title,file_1.song_id,file_1.release,file_1.artist_id,file_1.artist_mbid,file_1.artist_name,file_1.duration,file_1.artist_familiarity,file_1.artist_hotttnesss,file_1.year

I have tried the following methods:

join -t',' -1 N -1 N file_2.csv file_1.csv >> file_3.csv

and

awk -F, 'NR==FNR{a[$0]=$0;next} ($1 in a){print a[$1]"," > "file_3.csv"}' file_1.csv file_2.csv

Although the file_3.csv gets created, it is an empty file. Any ideas on how to do this?

Thanks!

like image 279
AngryPanda Avatar asked Feb 27 '15 08:02

AngryPanda


2 Answers

The following join command should do the trick:

join --header -t',' -j 1 file_2.csv file_1.csv

Just make sure that your CSV files are sorted on the join fields; having track_id as the first field in each file makes this easy.

You should use test data in both files and when you're satisfied that the command is doing what you want, you can run it against actual data and redirect its output to file_3.csv.

like image 187
Anthony Geoghegan Avatar answered Nov 20 '22 01:11

Anthony Geoghegan


Join should work as long as the files are sorted. Try:

join -t, <(sort -t, -k1 file_2.csv) <(sort -t, -k1 file_1.csv) > file3.csv
like image 3
ccarton Avatar answered Nov 19 '22 23:11

ccarton