Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Program which compares two files and print the same line

Tags:

awk

I have a problem. I want to create program, which print all lines which are in the first file and second file.

awk 'NR==FNR {include[$0];next} $0 in include' eq3_dgdg_1.ndx eq3_dgdg_2.ndx | tee eq4_dgdg_2.ndx

eq3_dgdg_1.ndx input

DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1598
DGD2 SOL63

eq3_dgdg_2.ndx

DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1176
DGD2 SOL1945
DGD2 SOL63

Output - and here is an error DGD1 SOL3605 - should be only once! Because I have in the first file only one line DGD1 SOL3605, not two, could you help me with that error?

DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63

Expected output

DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63
like image 921
Jakub Avatar asked Nov 24 '25 15:11

Jakub


1 Answers

If duplicated lines in a file are allowed, you need a counter. Give this a try:

awk 'NR==FNR{a[$0]++;next}a[$0]-->0' f1 f2

Let's have a test with your data:

kent$  head f*
==> f1 <==
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1598
DGD2 SOL63

==> f2 <==
DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1945
DGD2 SOL63

kent$  awk 'NR==FNR{a[$0]++;next}a[$0]-->0' f1 f2
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63
like image 150
Kent Avatar answered Nov 27 '25 13:11

Kent



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!