Find lines from a file which are not present in another file [duplicate]

People also ask

How can I tell if two files have the same content?

We can see if two files have the same content by calculating their hash values. As we can see, file1 and file3 have the same content as their hashes match, whereas file2 is different.

How do I find the common line between two files?

Use comm -12 file1 file2 to get common lines in both files. You may also needs your file to be sorted to comm to work as expected. Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

What is grep option?

The grep utility searches the input files, selecting lines matching one or more patterns; the types of patterns are controlled by the options specified. The patterns are specified by the -e option, -f option, or the pattern_list operand.

Which command is used to find the no of lines in a file?

Use the wc command to count the number of lines, words, and bytes in the files specified by the File parameter.

The command you have to use is not diff but comm

comm -23 a.txt b.txt

By default, comm outputs 3 columns: left-only, right-only, both. The -1, -2 and -3 switches suppress these columns.

So, -23 hides the right-only and both columns, showing the lines that appear only in the first (left) file.

If you want to find lines that appear in both, you can use -12, which hides the left-only and right-only columns, leaving you with just the both column.

The simple answer did not work for me because I didn't realize comm matches line for line, so duplicate lines in one file will be printed as not-existing in the other. For example, if file1 contained:

Alex
Bill
Fred

And file2 contained:

Alex
Bill
Bill
Bill
Fred

Then comm -13 file1 file2 would output:

Bill
Bill

In my case, I wanted to know only that every string in file2 existed in file1, regardless of how many times that line occurred in each file.

Solution 1: use the -u (unique) flag to sort:

comm -13 <(sort -u file1) <(sort -u file2)

Solution 2: (the first "working" answer I found) from unix.stackexchange:

fgrep -v -f file1 file2

Note that if file2 contains duplicate lines that don't exist at all in file1, fgrep will output each of the duplicate lines. Also note that my totally non-scientific tests on a single laptop for a single (fairly large) dataset showed Solution 1 (using comm) to be almost 5 times faster than Solution 2 (using fgrep).

I am not sure why it has been said diff should not be used. I would use it to compare the two files and then output only lines that are in the left file but not in right one. Such lines are flagged by diff with < so it suffices to grep that symbol at the beginning of the line

diff a.txt b.txt  | grep \^\<

In the case the files wouldn't be sorted yet, you can use:

comm -23 <(sort a.txt) <(sort b.txt)

Related questions
                            
                                How to send a simple string between two programs using pipes?
                            
                                Convert decimal to hexadecimal in UNIX shell script
                            
                                What does 'stale file handle' in Linux mean?
                            
                                Grep for literal strings
                            
                                Is there a way to ignore header lines in a UNIX sort?
                            
                                Track the time a command takes in UNIX/LINUX?
                            
                                How/When does Execute Shell mark a build as failure in Jenkins?
                            
                                What's the difference between --general-numeric-sort and --numeric-sort options in gnu sort
                            
                                How do you set your pythonpath in an already-created virtualenv?
                            
                                Why should eval be avoided in Bash, and what should I use instead?
                            
                                Multiplication on command line terminal
                            
                                An efficient way to transpose a file in Bash
                            
                                find -exec cmd {} + vs | xargs
                            
                                How does this bash fork bomb work? [duplicate]
                            
                                mkdir's "-p" option
                            
                                How does grep run so fast?
                            
                                How to get a list of file names in different lines
                            
                                How does SIGINT relate to the other termination signals such as SIGTERM, SIGQUIT and SIGKILL?
                            
                                Why always ./configure; make; make install; as 3 separate steps?
                            
                                What's the magic of "-" (a dash) in command-line parameters?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find lines from a file which are not present in another file [duplicate]

Tags:

unix

text-files

People also ask

Recent Activity

Donate For Us