Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two csv files in UNIX and create delta ( modified/ new records )

Tags:

unix

awk

I have two csv files old.csv and new.csv. I need only new or updated records from new.csv file. Delete records from new.csv if that is exists in old.csv.

old.csv

"R","abc","london","1234567"
"S","def","london","1234567"
"T","kevin","boston","9876"
"U","krish","canada","1234567"

new.csv

"R","abc","london","5678"
"S","def","london","1234567"
"T","kevin","boston","9876"
"V","Bell","tokyo","2222"

Output in new.csv

"R","abc","london","5678"     
"V","Bell","tokyo","2222"

Note : if All records are same in new.csv then new.csv should be empty

like image 842
user6742120 Avatar asked Mar 22 '17 16:03

user6742120


People also ask

How can I compare two csv files?

Click on "Compare" button to compare your CSV files! You can choose to display only the rows with differences or to display them all (With a color code to visualize the differences).

Which command is used to compare two files Unix?

Use the diff command to compare text files. It can compare single files or the contents of directories. When the diff command is run on regular files, and when it compares text files in different directories, the diff command tells which lines must be changed in the files so that they match.

How do you find the difference between two files in UNIX?

cmp : This command is used to compare two files byte by byte and as any mismatch occurs,it echoes it on the screen. if no mismatch occurs i gives no response. syntax:$cmp file1 file2. comm : This command is used to find out the records available in one but not in another.


2 Answers

Use for example grep:

$ grep -v -f old.csv new.csv # > the_new_new.csv 
"R","abc","london","5678"
"V","Bell","tokyo","2222"

and:

$ grep -v -f old.csv old.csv
$                            # see, no differencies in 2 identical files

man grep:

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)

Then again, you could use awk for it:

$ awk 'NR==FNR{a[$0];next} !($0 in a)' old.csv new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"

Explained:

awk '
NR==FNR{            # the records in the first file are hashed to memory
    a[$0]
    next
} 
!($0 in a)          # the records which are not found in the hash are printed
' old.csv new.csv   # > the_new_new.csv 
like image 197
James Brown Avatar answered Oct 13 '22 15:10

James Brown


When the files are sorted:

comm -13 old.csv new.csv

When they are not sorted, and sorting is allowed:

comm -13 <(sort old.csv) <(sort new.csv)
like image 20
Walter A Avatar answered Oct 13 '22 14:10

Walter A