How to diff two file lists and ignoring place in list

Tags:

I have two lists of files which I want to diff. The second list has more files in it, and because they are all in alphabetical order when I diff these two lists I get files (lines) that exists in both lists, but in a different place.

I want to diff these two lists, ignoring line place in the list. This way I would get only the new or missing lines in the list.

Thank you.

991

asked Sep 16 '10 08:09

Nir

3 Answers

You can try this approach which involves "subtracting" the two lists as follows:

$ cat file1
a.txt
b.txt
c.txt

$ cat file2
a.txt
a1.txt
b.txt
b2.txt

1) print everything in file2 that is not in file1 i.e. file2 - file1

$ grep -vxFf file1 file2
a1.txt
b2.txt

2) print everything in file1 that is not in file2 i.e. file1 - file2

$ grep -vxFf file2 file1
c.txt

(You can then do what you want with these diffs e.g. write to file, sort etc)

grep options descriptions:

  -v, --invert-match        select non-matching lines
  -x, --line-regexp         force PATTERN to match only whole lines
  -F, --fixed-strings       PATTERN is a set of newline-separated strings
  -f, --file=FILE           obtain PATTERN from FILE

153

answered Nov 07 '22 19:11

dogbane

Do the following:

cat file1 file2 | sort | uniq -u

This will give you a list of lines which are unique (ie, not duplicated).

Explanation:
1) cat file1 file2 will put all of the entries into one list
2) sort will sort the combined list
3) uniq -u will only output the entries which don't have duplicates

answered Nov 07 '22 18:11

No One in Particular

The deft command to use here is the humble `comm` command:

To demonstrate, let's create two input files:

$ cat <<EOF >a
> a.txt
> b.txt
> c.txt
> EOF

$ cat <<EOF >b
> a.txt
> a1.txt
> b.txt
> b2.txt
> EOF

Now, using the comm command to get what the question wanted:

$ comm -2 a b
        a.txt
        b.txt
c.txt

This shows a columnar output with missing files (lines in a but not in b) in the first column and extra files (lines in b but not in a) in the second column.

What exactly does `comm` do?

Here's the output if the command is typed without any switches:

$ comm a b
                a.txt
        a1.txt
                b.txt
        b2.txt
c.txt

This shows three columns thus:

Lines in a but not in b
Lines in both a and b
Lines in b but not in a

What the numbered switches -123 do is it hides the specified column from the output.

So for example:

Specifying -13 results in common lines only
Specifying -12 results in lines only in b
Specifying -23 results in lines only in a
Specifying -2 results in the symmetric difference
Specifying -123 results in no output

answered Nov 07 '22 18:11

antak

Related questions
                            
                                transport endpoint is not connected recv socket
                            
                                Cut from column to end of line
                            
                                Python piping output between two subprocesses
                            
                                Giving a file/directory the same modification date as another
                            
                                tmux transpose / rearrange panes?
                            
                                extract words from a file
                            
                                Why should a Java programmer care about year 2038 bug?
                            
                                How to add file extensions based on file type on Linux/Unix?
                            
                                tee and exit status
                            
                                How do I write a unix filter in python?
                            
                                How to prevent a Linux program from running more than once?
                            
                                How to test if your Linux Support SSE2
                            
                                How to escape the ampersand character while using sed
                            
                                xargs - if condition and echo {}
                            
                                What is the best windows equivalent for /tmp?
                            
                                List environment variables with C in UNIX
                            
                                Determine values of several system variables in the terminal in a Mac
                            
                                Using the grep and cut delimiter command (in bash shell scripting UNIX) - and kind of "reversing" it?
                            
                                Load .bash_profile for every terminal
                            
                                Comparison function that compares two text files in Unix

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to diff two file lists and ignoring place in list

Tags:

unix

diff

Nir

People also ask

3 Answers

dogbane

No One in Particular

The deft command to use here is the humble `comm` command:

What exactly does `comm` do?

antak

Recent Activity

Donate For Us

How to diff two file lists and ignoring place in list

Tags:

unix

diff

Nir

People also ask

3 Answers

dogbane

No One in Particular

The deft command to use here is the humble comm command:

What exactly does comm do?

antak

Related questions

Recent Activity

Donate For Us

The deft command to use here is the humble `comm` command:

What exactly does `comm` do?