I have 2 CSV files and I need to JOIN them using BASH:
file_1.csv columns:
track_id
title
song_id
release
artist_id
artist_mbid
artist_name
duration
artist_familiarity
artist_hotttnesss
year
Sample date in file_1.csv
TRZZZZZ12903D05E3A,Infra Stellar,SOZPUEF12AF72A9F2A,Archives Vol. 2,ARBG8621187FB54842,4279aba0-1bde-40a9-8fb2-c63d165dc554,Delerium,495.22893,0.69652442519,0.498471038842,2001
file_2.csv columns:
track_id
sales_date
sales_count
Sample data in file_2.csv
TRZZZZZ12903D05E3A,2014-06-19,79
The relation between the files is that file_1.track_id = file_2.track_id
.
I want to create a 3rd file file_3.csv
that will have the following columns:
file_2.track_id,file_2.sales_date,file_2.sales_count,file_1.title,file_1.song_id,file_1.release,file_1.artist_id,file_1.artist_mbid,file_1.artist_name,file_1.duration,file_1.artist_familiarity,file_1.artist_hotttnesss,file_1.year
I have tried the following methods:
join -t',' -1 N -1 N file_2.csv file_1.csv >> file_3.csv
and
awk -F, 'NR==FNR{a[$0]=$0;next} ($1 in a){print a[$1]"," > "file_3.csv"}' file_1.csv file_2.csv
Although the file_3.csv
gets created, it is an empty file.
Any ideas on how to do this?
Thanks!
The following join
command should do the trick:
join --header -t',' -j 1 file_2.csv file_1.csv
Just make sure that your CSV files are sorted on the join fields; having
track_id
as the first field in each file makes this easy.
You should use test data in both files and when you're satisfied that the command is doing what you want, you can run it against actual data and redirect its output to file_3.csv
.
Join should work as long as the files are sorted. Try:
join -t, <(sort -t, -k1 file_2.csv) <(sort -t, -k1 file_1.csv) > file3.csv
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With