Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk to compare two files [duplicate]

Tags:

bash

shell

awk

I am trying to compare two files and want to print the matching lines... The lines present in the files will be unique

File1.txt

GERMANY
FRANCE
UK
POLLAND

File2.txt

POLLAND 
GERMANY

I tried with below command

awk 'BEGIN { FS="\n" } ; NR==FNR{A[$1]++;NEXT}A[$1]' File1.txt File2.txt

but it is printing the matching record twice, I want them to be printed once...

UPDATE

expected output

POLLAND 
GERMANY

Current Output

POLLAND 
GERMANY
POLLAND 
GERMANY
like image 243
upog Avatar asked Feb 28 '14 16:02

upog


3 Answers

grep together with -f (for file) is best for this:

$ grep -f f1 f2
POLLAND 
GERMANY

And in fact, to get exact matches and no regex, use respectively -w and -F:

$ grep -wFf f1 f2
POLLAND 
GERMANY

If you really have to do it with awk, then you can use:

$ awk 'FNR==NR {a[$1]; next} $1 in a' f1 f2
POLLAND 
GERMANY
  • FNR==NR is performed when reading the first file.
  • {a[$1]; next} stores in a[] the lines of the first file and goes to the next line.
  • $1 in a is evaluated when looping through the second file. It checks if the current line is within the a[] array.

Why wasn't your script working?

  • Because you used NEXT instead of next. So it was treated as a constant instead of a command.
  • Also, because the BEGIN { FS="\n" } was wrong, as the default FS is a space and it is ok to be like that. Setting it as a new line was making it misbehave.
like image 164
fedorqui 'SO stop harming' Avatar answered Oct 26 '22 22:10

fedorqui 'SO stop harming'


Your command should maybe be:

awk 'NR==FNR{A[$1]++;next}A[$1]' file1 file2

You have a stray semi-colon after the closing brace of BEGIN{} and also have "NEXT" in capital letters and have mis-spelled your filename.

like image 33
Mark Setchell Avatar answered Oct 26 '22 21:10

Mark Setchell


Try this one-liner:

awk 'NR==FNR{name[$1]++;next}$1 in name' file1.txt file2.txt
  • You iterate through first file NR==FNR storing the names in an array called names.
  • You use next to prevent the second action from happneing until first file is completely stored in array.
  • Once the first file is complete, you start the next file by checking if it is present in the array. It will print out the name if it exits.
  • FS is field separator. You don't need to set that to new line. You need RS which is Record Separator to be new line. But we don't do that here because that it the default value.
like image 24
jaypal singh Avatar answered Oct 26 '22 21:10

jaypal singh