Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Display duplicate lines in two different files

Tags:

linux

bash

I have two files and I would like to display the duplicate line. I tried this but it doesn't work :

cat id1.txt | while read id; do grep "$id" id2.txt; done

I am wondering if there are any other way to display the duplicate lines in the file. Both of my 2 files contain list of ids. Thank you.

like image 898
Chad D Avatar asked Mar 26 '13 19:03

Chad D


People also ask

How do I find duplicates in two rows?

Here is how to do it: Select the data. Go to Home –> Conditional Formatting –> Highlight Cell Rules –> Duplicate Values. In the Duplicate Values dialog box, select Duplicate in the drop down on the left, and specify the format in which you want to highlight the duplicate values.

How do you find the common lines between two files in UNIX?

Use comm -12 file1 file2 to get common lines in both files. You may also needs your file to be sorted to comm to work as expected. Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

Can Notepad ++ find duplicates?

Find duplicates and delete all in notepad++n)\1+ Replace with: (Nothing, leave empty) Check Regular Expression in the lower left. Click Replace All.


2 Answers

use awk will save your time.

awk 'FNR==NR{lines[$0]=1;next} $0 in lines' id1.txt id2.txt

#explaination
FNR==NR #check whether the File NR equal to NR, 
#which will only be true for the first file
lines[$0]=1 #put the contents into a dictionary, 
#value is 1, key is the lines of the first file
next #do not do the next commands if FNR==NR
$0 in lines #check whether the line in the second file
# is in the dictionary
#if yes, will print the $0
#acturally, I omitted the {print},
#which is default to print by awk if condition is true
like image 92
Sandy Avatar answered Oct 28 '22 18:10

Sandy


Are the files sorted? Can they be sorted?

If sorted:

comm -12 id1.txt id2.txt

If not sorted but using bash 4.x:

comm -12 <(sort id1.txt) <(sort id2.txt)

There are solutions using temporary files if you don't have bash 4.x and 'process substitution'.

You could also use grep -F:

grep -F -f id1.txt id2.txt

This looks for the words in id1.txt that appear in id2.txt. The only problem here is ensuring that an ID 1 doesn't match every ID containing a 1 somewhere. The -w or -x options available in some versions of grep will work here.

like image 36
Jonathan Leffler Avatar answered Oct 28 '22 19:10

Jonathan Leffler