Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to diff two file lists and ignoring place in list

Tags:

unix

diff

I have two lists of files which I want to diff. The second list has more files in it, and because they are all in alphabetical order when I diff these two lists I get files (lines) that exists in both lists, but in a different place.

I want to diff these two lists, ignoring line place in the list. This way I would get only the new or missing lines in the list.

Thank you.

like image 991
Nir Avatar asked Sep 16 '10 08:09

Nir


People also ask

When two files are identical What is the output of diff command?

Explanation: When two files are identical, diff command does not produce any output. It simply returns the shell prompt $. However, we can use the -s option to display an informative message on the terminal if the files are identical.

How do I compare the contents of two files in Linux?

The Linux diff command is used to compare two files line by line and display the difference between them. This command-line utility lists changes you need to apply to make the files identical.


3 Answers

You can try this approach which involves "subtracting" the two lists as follows:

$ cat file1
a.txt
b.txt
c.txt

$ cat file2
a.txt
a1.txt
b.txt
b2.txt

1) print everything in file2 that is not in file1 i.e. file2 - file1

$ grep -vxFf file1 file2
a1.txt
b2.txt

2) print everything in file1 that is not in file2 i.e. file1 - file2

$ grep -vxFf file2 file1
c.txt

(You can then do what you want with these diffs e.g. write to file, sort etc)

grep options descriptions:

  -v, --invert-match        select non-matching lines
  -x, --line-regexp         force PATTERN to match only whole lines
  -F, --fixed-strings       PATTERN is a set of newline-separated strings
  -f, --file=FILE           obtain PATTERN from FILE
like image 153
dogbane Avatar answered Nov 07 '22 19:11

dogbane


Do the following:

cat file1 file2 | sort | uniq -u

This will give you a list of lines which are unique (ie, not duplicated).

Explanation:
1) cat file1 file2 will put all of the entries into one list
2) sort will sort the combined list
3) uniq -u will only output the entries which don't have duplicates

like image 45
No One in Particular Avatar answered Nov 07 '22 18:11

No One in Particular


The deft command to use here is the humble comm command:

To demonstrate, let's create two input files:

$ cat <<EOF >a
> a.txt
> b.txt
> c.txt
> EOF

$ cat <<EOF >b
> a.txt
> a1.txt
> b.txt
> b2.txt
> EOF

Now, using the comm command to get what the question wanted:

$ comm -2 a b
        a.txt
        b.txt
c.txt

This shows a columnar output with missing files (lines in a but not in b) in the first column and extra files (lines in b but not in a) in the second column.

What exactly does comm do?

Here's the output if the command is typed without any switches:

$ comm a b
                a.txt
        a1.txt
                b.txt
        b2.txt
c.txt

This shows three columns thus:

  1. Lines in a but not in b
  2. Lines in both a and b
  3. Lines in b but not in a

What the numbered switches -123 do is it hides the specified column from the output.

So for example:

  • Specifying -13 results in common lines only
  • Specifying -12 results in lines only in b
  • Specifying -23 results in lines only in a
  • Specifying -2 results in the symmetric difference
  • Specifying -123 results in no output
like image 8
antak Avatar answered Nov 07 '22 18:11

antak