Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using comm to diff two files

Tags:

bash

shell

I am trying to use comm to compute the difference between two sorted files, however the result doesn't make sense, what's wrong? I want to show the strings that exists in test2 but not test1, and then show the strings that exist in test1 but not test2

>test1
a
b
d
g

>test2
e
g 
k
p

>comm test1 test2
a
b
d
    e
g
    g 
    k
    p
like image 989
user121196 Avatar asked Dec 22 '11 00:12

user121196


People also ask

What is the command used to compare the sorted files?

To compare two sorted files, we use the comm command in the Linux system. The comm command is used to compare two sorted files line by line and writes three columns to standard output. The first two columns contain lines unique to the first and the second file and the last column contains lines common to both.

Does Windows 10 have a file comparison tool?

On Windows 10, "fc" is a command-line tool that comes built-in to the system, and it allows you to compare two similar files to determine how they changed over time.

What does the comm command do?

The comm command compares two sorted files line by line and writes three columns to standard output. These columns show lines that are unique to files one, lines that are unique to file two and lines that are shared by both files. It also supports suppressing column outputs and comparing lines without case sensitivity.


2 Answers

To show the lines that exist in test2 but not in test1, write either of these:

comm -13 test1 test2
comm -23 test2 test1

(-1 hides the column with lines that exist only in the first file; -2 hides the column with lines that exist only in the second file; -3 hides the column with lines that exist in both files.)

And, vice versa to show the lines that exist in test1 but not in test2.

Note that g on a line by itself is considered distinct from g with a space after it, which is why you get

g
    g 

instead of

        g
like image 64
ruakh Avatar answered Oct 04 '22 07:10

ruakh


Add a character in common between the 2 files, say 'z' at the end. You'll see that a 3rd columns appears, to indicate that that value is common to both.

The output is meant to show 'data in col1 is uniq to file1', while 'data in col2 is unique to file2'.

Finally, arguments to comm '-1, -2, -3' mean suppress output from column numbered supplied, for example, -1.

I hope this helps.

like image 27
shellter Avatar answered Oct 04 '22 08:10

shellter