Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use awk to test if a column value is in another file?

Tags:

linux

awk

I want to do something like

if ($2 in another file) { print $0 }

So say I have file A.txt which contains

aa
bb
cc

I have B.txt like

00,aa
11,bb
00,dd

I want to print

00,aa
11,bb

How do I test that in awk? I am not familiar with the tricks of processing two files at a time.

like image 669
CuriousMind Avatar asked Mar 01 '16 21:03

CuriousMind


3 Answers

You could use something like this:

awk -F, 'NR == FNR { a[$0]; next } $2 in a' A.txt B.txt

This saves each line from A.txt as a key in the array a and then prints any lines from B.txt whose second field is in the array.

NR == FNR is the standard way to target the first file passed to awk, as NR (the total record number) is only equal to FNR (the record number for the current file) for the first file. next skips to the next record so the $2 in a part is never reached until the second file.

like image 166
Tom Fenech Avatar answered Nov 08 '22 21:11

Tom Fenech


alternative with join

if the files are both sorted on the joined field

$ join -t, -1 1 -2 2 -o2.1,2.2 file1 file2

00,aa
11,bb

set delimiter to comma, join first field from first file with second field from second file, output fields swapped. If not sorted you need to sort them first, but then awk might be a better choice.

like image 24
karakfa Avatar answered Nov 08 '22 23:11

karakfa


There seem to be two schools of thought on the matter. Some prefer to use the BEGIN-based idiom, and others the FNR-based idiom.

Here's the essence of the former:

awk -v infile=INFILE '
  BEGIN { while( (getline < infile)>0 ) { .... } }
  ... '

For the latter, just search for:

awk 'FNR==NR'

like image 28
peak Avatar answered Nov 08 '22 22:11

peak