awk to compare two files [duplicate]

Question

I am trying to compare two files and want to print the matching lines... The lines present in the files will be unique

File1.txt

GERMANY
FRANCE
UK
POLLAND

File2.txt

POLLAND 
GERMANY

I tried with below command

awk 'BEGIN { FS="
" } ; NR==FNR{A[$1]++;NEXT}A[$1]' File1.txt File2.txt

but it is printing the matching record twice, I want them to be printed once...

UPDATE

expected output

POLLAND 
GERMANY

Current Output

POLLAND 
GERMANY
POLLAND 
GERMANY

fedorqui 'SO stop harming' · Accepted Answer

grep together with -f (for file) is best for this:

$ grep -f f1 f2
POLLAND 
GERMANY

And in fact, to get exact matches and no regex, use respectively -w and -F:

$ grep -wFf f1 f2
POLLAND 
GERMANY

If you really have to do it with awk, then you can use:

$ awk 'FNR==NR {a[$1]; next} $1 in a' f1 f2
POLLAND 
GERMANY

FNR==NR is performed when reading the first file.
{a[$1]; next} stores in a[] the lines of the first file and goes to the next line.
$1 in a is evaluated when looping through the second file. It checks if the current line is within the a[] array.

Because you used NEXT instead of next. So it was treated as a constant instead of a command.
Also, because the BEGIN { FS=" " } was wrong, as the default FS is a space and it is ok to be like that. Setting it as a new line was making it misbehave.

Mark Setchell · Answer

Your command should maybe be:

awk 'NR==FNR{A[$1]++;next}A[$1]' file1 file2

You have a stray semi-colon after the closing brace of BEGIN{} and also have "NEXT" in capital letters and have mis-spelled your filename.

jaypal singh · Answer

Try this one-liner:

awk 'NR==FNR{name[$1]++;next}$1 in name' file1.txt file2.txt

You iterate through first file NR==FNR storing the names in an array called names.
You use next to prevent the second action from happneing until first file is completely stored in array.
Once the first file is complete, you start the next file by checking if it is present in the array. It will print out the name if it exits.
FS is field separator. You don't need to set that to new line. You need RS which is Record Separator to be new line. But we don't do that here because that it the default value.

Donate For Us