Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shell command to find lines common in two files

I'm sure I once found a shell command which could print the common lines from two or more files. What is its name?

It was much simpler than diff.

like image 310
too much php Avatar asked Dec 17 '08 06:12

too much php


People also ask

How do I find the common line of two files?

Use comm -12 file1 file2 to get common lines in both files. You may also needs your file to be sorted to comm to work as expected. Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

Which command is used to display common and uncommon records from two files?

comm is the command that will help you to achieve this. It requires two sorted files which it compares line by line. As using comm, we are trying to compare two files therefore the syntax of comm command needs two filenames as arguments.

How can you tell if two files are identical in Shell?

Probably the easiest way to compare two files is to use the diff command. The output will show you the differences between the two files. The < and > signs indicate whether the extra lines are in the first (<) or second (>) file provided as arguments.


4 Answers

The command you are seeking is comm. eg:-

comm -12 1.sorted.txt 2.sorted.txt

Here:

-1 : suppress column 1 (lines unique to 1.sorted.txt)

-2 : suppress column 2 (lines unique to 2.sorted.txt)

like image 177
Jonathan Leffler Avatar answered Oct 17 '22 09:10

Jonathan Leffler


To easily apply the comm command to unsorted files, use Bash's process substitution:

$ bash --version
GNU bash, version 3.2.51(1)-release
Copyright (C) 2007 Free Software Foundation, Inc.
$ cat > abc
123
567
132
$ cat > def
132
777
321

So the files abc and def have one line in common, the one with "132". Using comm on unsorted files:

$ comm abc def
123
    132
567
132
    777
    321
$ comm -12 abc def # No output! The common line is not found
$

The last line produced no output, the common line was not discovered.

Now use comm on sorted files, sorting the files with process substitution:

$ comm <( sort abc ) <( sort def )
123
            132
    321
567
    777
$ comm -12 <( sort abc ) <( sort def )
132

Now we got the 132 line!

like image 20
Stephan Wehner Avatar answered Oct 17 '22 10:10

Stephan Wehner


To complement the Perl one-liner, here's its awk equivalent:

awk 'NR==FNR{arr[$0];next} $0 in arr' file1 file2

This will read all lines from file1 into the array arr[], and then check for each line in file2 if it already exists within the array (i.e. file1). The lines that are found will be printed in the order in which they appear in file2. Note that the comparison in arr uses the entire line from file2 as index to the array, so it will only report exact matches on entire lines.

like image 38
Tatjana Heuser Avatar answered Oct 17 '22 10:10

Tatjana Heuser


Maybe you mean comm ?

Compare sorted files FILE1 and FILE2 line by line.

With no options, produce three-column output. Column one contains lines unique to FILE1, column two contains lines unique to FILE2, and column three contains lines common to both files.

The secret in finding these information are the info pages. For GNU programs, they are much more detailed than their man-pages. Try info coreutils and it will list you all the small useful utils.

like image 25
Johannes Schaub - litb Avatar answered Oct 17 '22 11:10

Johannes Schaub - litb