merge two csv files according to matching rows and add new columns in linux

Question

I am developing an application using java, but for that I need a csv file in some order. I dont know linux much, but wondering if some way is there to merge the csv files in the required format.

I have two csv files containing hundreds of thousands of records. a sample is below:

name,Direction,Date
abc,sent,Jan 21 2014 02:06 
xyz,sent,Nov 21 2014 01:09
pqr,sent,Oct 21 2014 03:06

and

name,Direction,Date
abc,received,Jan 22 2014 02:06
xyz,received,Nov 22 2014 02:06

so, this second csv file would contain some records of file 1. What I need is a new csv like this:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

Need to add the columns(4th and 5th column) according to the matching data in column1. if no matching data is there in the second file, the columns should be empty like above.

so is there a bash command in linux to achieve this?

Kent · Accepted Answer

awk may work for you:

kent$  awk -F, -v OFS="," 
       'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$0;next}
        FNR>1{printf "%s%s
",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,abc,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,xyz,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

update

kent$  awk -F, -v OFS="," 'BEGIN{print "name,Direction,Date,currentDirection,receivedDate"}
        NR==FNR&&NR>1{a[$1]=$2 FS $3;next}
        FNR>1{printf "%s%s
",$0,($1 in a?FS a[$1]:"")}' 2.csv 1.csv 
name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

jftuga · Answer

You can use this join command to accomplish this. The first file is 1.csv and the second file is 2.csv

join -1 1 -2 1 -t, -a 1  1.csv 2.csv | sed "s/Direction,Date/currentDirection,receivedDate/2"

Output:

name,Direction,Date,currentDirection,receivedDate
abc,sent,Jan 21 2014 02:06,received,Jan 22 2014 02:06
xyz,sent,Nov 21 2014 01:09,received,Nov 22 2014 02:06
pqr,sent,Oct 21 2014 03:06

Explanation:

You want to join on the first field in both files, therefore -1 1 -2 1

You want to use comma, therefore -t,

You want to display all unmatched records in file 1, therefore -a 1, you could also append -a 2 if needed.

The /2 in the sed command tells sed to replace the 2nd occurrence

merge two csv files according to matching rows and add new columns in linux

Tags:

linux

unix

ani0710

2 Answers

update

Kent

jftuga

Recent Activity

Donate For Us

merge two csv files according to matching rows and add new columns in linux

Tags:

linux

unix

ani0710

2 Answers

update

Kent

jftuga

Related questions

Recent Activity

Donate For Us