Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

merge two csv files according to matching rows and add new columns in linux

Tags:

linux

unix

I am developing an application using java, but for that I need a csv file in some order. I dont know linux much, but wondering if some way is there to merge the csv files in the required format.

I have two csv files containing hundreds of thousands of records. a sample is below:

name,Direction,Date
abc,sent,Jan 21 2014 02:06 
xyz,sent,Nov 21 2014 01:09
pqr,sent,Oct 21 2014 03:06  

and

name,Direction,Date
abc,received,Jan 22 2014 02:06
xyz,received,Nov 22 2014 02:06

so, this second csv file would contain some records of file 1. What I need is a new csv like this:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

Need to add the columns(4th and 5th column) according to the matching data in column1. if no matching data is there in the second file, the columns should be empty like above.

so is there a bash command in linux to achieve this?

like image 686
ani0710 Avatar asked Mar 17 '23 18:03

ani0710


2 Answers

awk may work for you:

kent$  awk -F, -v OFS="," 
       'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$0;next}
        FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,abc,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,xyz,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

update

kent$  awk -F, -v OFS="," 'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$2 FS $3;next}
        FNR>1{printf "%s%s\n",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv 
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06
like image 191
Kent Avatar answered Mar 26 '23 01:03

Kent


You can use this join command to accomplish this. The first file is 1.csv and the second file is 2.csv

join -1 1 -2 1 -t, -a 1  1.csv 2.csv | sed "s/Direction,Date/currentDirection,receivedDate/2"

Output:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06 

Explanation:

You want to join on the first field in both files, therefore -1 1 -2 1

You want to use comma, therefore -t,

You want to display all unmatched records in file 1, therefore -a 1, you could also append -a 2 if needed.

The /2 in the sed command tells sed to replace the 2nd occurrence

like image 40
jftuga Avatar answered Mar 26 '23 02:03

jftuga