I have two csv files old.csv and new.csv. I need only new or updated records from new.csv file. Delete records from new.csv if that is exists in old.csv. old.csv <pre class="prettyprint"><code>"R","abc","london","1234567" "S","def","london","1234567" "T","kevin","boston","9876" "U","krish","canada","1234567" </code></pre> new.csv <pre class="prettyprint"><code>"R","abc","london","5678" "S","def","london","1234567" "T","kevin","boston","9876" "V","Bell","tokyo","2222" </code></pre> Output in new.csv <pre class="prettyprint"><code>"R","abc","london","5678" "V","Bell","tokyo","2222" </code></pre> Note : if All records are same in new.csv then new.csv should be empty

When the files are sorted: <pre class="prettyprint"><code>comm -13 old.csv new.csv </code></pre> When they are not sorted, and sorting is allowed: <pre class="prettyprint"><code>comm -13 <(sort old.csv) <(sort new.csv) </code></pre>

How to compare two csv files in UNIX and create delta ( modified/ new records )

Tags:

unix

awk

I have two csv files old.csv and new.csv. I need only new or updated records from new.csv file. Delete records from new.csv if that is exists in old.csv.

old.csv

"R","abc","london","1234567"
"S","def","london","1234567"
"T","kevin","boston","9876"
"U","krish","canada","1234567"

new.csv

"R","abc","london","5678"
"S","def","london","1234567"
"T","kevin","boston","9876"
"V","Bell","tokyo","2222"

Output in new.csv

"R","abc","london","5678"     
"V","Bell","tokyo","2222"

Note : if All records are same in new.csv then new.csv should be empty

842

asked Mar 22 '17 16:03

user6742120

2 Answers

Use for example grep:

$ grep -v -f old.csv new.csv # > the_new_new.csv 
"R","abc","london","5678"
"V","Bell","tokyo","2222"

and:

$ grep -v -f old.csv old.csv
$                            # see, no differencies in 2 identical files

man grep:

  -f FILE, --file=FILE
          Obtain  patterns  from  FILE,  one  per  line.   The  empty file
          contains zero patterns, and therefore matches nothing.   (-f  is
          specified by POSIX.)

  -v, --invert-match
          Invert the sense of matching, to select non-matching lines.  (-v
          is specified by POSIX.)

Then again, you could use awk for it:

$ awk 'NR==FNR{a[$0];next} !($0 in a)' old.csv new.csv
"R","abc","london","5678"
"V","Bell","tokyo","2222"

Explained:

awk '
NR==FNR{            # the records in the first file are hashed to memory
    a[$0]
    next
} 
!($0 in a)          # the records which are not found in the hash are printed
' old.csv new.csv   # > the_new_new.csv

197

answered Oct 13 '22 15:10

James Brown

When the files are sorted:

comm -13 old.csv new.csv

When they are not sorted, and sorting is allowed:

comm -13 <(sort old.csv) <(sort new.csv)

answered Oct 13 '22 14:10

Walter A

Related questions
                            
                                Find number, and remove adjacent characters equal to this number
                            
                                Comparing two text files on a unix system
                            
                                Data section in a.out
                            
                                What does effect does a trailing number have on the body of an awk script?
                            
                                Shutdown hook from UNIX
                            
                                How does Unix read a path name with two consecutive slashes? (e.g. /home/user//mystuff)
                            
                                bashrc if: Expression Syntax error
                            
                                How to prevent user from deleting a file which is being used by JVM
                            
                                Signal handler for all signal
                            
                                Interpreting GDB registers (SSE registers)
                            
                                Does "cout<<(char*)NULL" doing "close(1)" here? [duplicate]
                            
                                How to set starting index in split command in linux ?
                            
                                DateTimeToUnix in UTC?
                            
                                Get parent user after sudo with Python
                            
                                How to sort ignoring the blank line
                            
                                Replace a double backslash followed by quote (\\') using sed?
                            
                                GnuTLS error -110: The TLS connection was non-properly terminated
                            
                                find files that are older than 15 minutes
                            
                                "cat /dev/random" versus "tail -f /dev/random"
                            
                                find command with 'regex' match not working

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With