Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

comparing two files by lines and removing duplicates from first file

Tags:

grep

bash

unix

Problem:

  1. Need to compare two files,
  2. removing the duplicate from the first file
  3. then appending the lines of file1 to file2

Illustration by example

Suppose, The two files are test1 and test2.

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6

And test1 is

$ cat test1
www.xyz.com/abc-1
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5

Comparing test1 to test2 and removing duplicates from test 1

Result Required:

$ cat test1
www.xyz.com/abc-1

and then adding this test1 data in to test2

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1

Solutions Tried:

join -v1 -v2 <(sort test1) <(sort test2)

which resulted into this (that was wrong output)

$ join -v1 -v2 <(sort test1) <(sort test2)
www.xyz.com/abc-1
www.xyz.com/abc-6

Another solution i tried was :

fgrep -vf test1 test2

which resulted nothing.

like image 625
Ankit Jain Avatar asked May 28 '16 19:05

Ankit Jain


3 Answers

Remove lines from test1 because they are in test2:

$ grep -vxFf test2 test1
www.xyz.com/abc-1

To overwrite test1:

grep -vxFf test2 test1 >test1.tmp && mv test1.tmp test1

To append the new test1 to the end of test2:

cat test1 >>test2

The grep options

grep normally prints matching lines. -v tells grep to do the reverse: it prints only lines that do not match

-x tells grep to do whole-line matches.

-F tells grep that we are using fixed strings, not regular expressions.

-f test2 tells grep to read those fixed strings, one per line, from file test2.

like image 140
John1024 Avatar answered Sep 24 '22 10:09

John1024


With awk:

% awk 'NR == FNR{ a[$0] = 1;next } !a[$0]' test2 test1
www.xyz.com/abc-1

Breakdown:

NR == FNR { # Run for test2 only
  a[$0] = 1 # Store whole line as key in associative array
  next      # Skip next block
}
!a[$0]      # Print line from test1 that are not in a
like image 38
Andreas Louv Avatar answered Sep 23 '22 10:09

Andreas Louv


Solution to 1 and 2 problem.

diff test1 test2 |grep "<"|sed  's/< \+//g' > test1.tmp|mv test1.tmp test1

here is the output

$ cat test1
www.xyz.com/abc-1

solution to 3 problem.

cat test1 >> test2

here is the output

$ cat test2
www.xyz.com/abc-2
www.xyz.com/abc-3
www.xyz.com/abc-4
www.xyz.com/abc-5
www.xyz.com/abc-6
www.xyz.com/abc-1
like image 41
sumitya Avatar answered Sep 22 '22 10:09

sumitya