I have a problem. I want to create program, which print all lines which are in the first file and second file.
awk 'NR==FNR {include[$0];next} $0 in include' eq3_dgdg_1.ndx eq3_dgdg_2.ndx | tee eq4_dgdg_2.ndx
eq3_dgdg_1.ndx input
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1598
DGD2 SOL63
eq3_dgdg_2.ndx
DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1176
DGD2 SOL1945
DGD2 SOL63
Output - and here is an error DGD1 SOL3605 - should be only once! Because I have in the first file only one line DGD1 SOL3605, not two, could you help me with that error?
DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63
Expected output
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63
If duplicated lines in a file are allowed, you need a counter. Give this a try:
awk 'NR==FNR{a[$0]++;next}a[$0]-->0' f1 f2
Let's have a test with your data:
kent$ head f*
==> f1 <==
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1598
DGD2 SOL63
==> f2 <==
DGD1 SOL3605
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL1945
DGD2 SOL63
kent$ awk 'NR==FNR{a[$0]++;next}a[$0]-->0' f1 f2
DGD1 SOL3605
DGD2 SOL1176
DGD2 SOL63
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With