I'd essentially like to combine the power of
grep -f
with
awk '{ if($2=="this is where I'd like to input a file of fixed string patterns") print $0}'
Which is to say, I'd like to search a specific column of a file (File 1) with an input file of patterns (File 2). If a match is found simply:
> outputfile.txt
From a previous post, this awk line is really close:
awk 'NR==FNR{a[$0]=1;next} {n=0;for(i in a){if($0~i){n=1}}} n' file1 file2
Taken from Obtain patterns in one file from another using ack or awk or better way than grep?
But it doesn't search a specific column of file 1. I'm open to other tools as well.
The example you found is indeed very close to what you want, the only difference is that you don't want to match the whole line ($0
).
Modify it to something like this:
awk 'NR==FNR { pats[$0]=1; next } { for(p in pats) if($2 ~ p) { print $0; break } }' patterns file
If you only need a fixed string match, use the index()
function instead, i.e. replace $2 ~ p
with index($2, p)
.
You could also provide the column number as an argument to awk, e.g.:
awk -v col=$col 'NR==FNR { pats[$0]=1; next } { for(p in pats) if($col ~ p) { print $0; break } }' patterns file
You can accomplish this with the ==
operator:
awk -v col=$col 'NR==FNR { pats[$0]=1; next } { for(p in pats) if($col == p) { print $0; break } }' patterns file
This is using awk
:
awk 'BEGIN { while(getline l < "patterns.txt") PATS[l] } $2 in PATS' file2
Where file1
is the file you are searching, and patterns.txt
is a file with one exact pattern per file. The implicit {print}
has been omitted but you can add it and do anything you like there.
The condition $2 in PATS
will be true is the second column is exactly one of the patterns.
If patterns.txt
are to be treated as regexp matches, modify it to
ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; ok
So, for example, to test $2
against all the regexps in patterns.txt
, and print the
third column if the 2nd column matched:
awk 'BEGIN { while(getline l < "patterns.txt") PATS[l] }
ok=0;{for (p in PATS) if ($2 ~ p) ok=1}; ok
{print $3}' < file2
And here's a version in perl
. Similar to the awk
version except that it
uses regexps instead of fields.
perl -ne 'BEGIN{open $pf, "<patterns.txt"; %P=map{chomp;$_=>1}<$pf>}
/^\s*([^\s]+)\s+([^\s]+).*$/ and exists $P{$2} and print' < file2
Taking that apart:
BEGIN{
open $pf, "<patterns.txt";
%P = map {chomp;$_=>1} <$pf>;
}
Reads in your patterns file into a has %P
for fast lookup.
/^\s*([^\s]+)\s+([^\s]+).*$/ and # extract your fields into $1, $2, etc
exists $P{$2} and # See if your field is in the patterns hash
print; # just print the line (you could also
# print anything else; print "$1\n"; etc)
It gets slightly shorter if your input file is tab-separated (and when you know that there's exactly one tab between fields). Here's an example that matches the patterns against the 5th column:
perl -F"\t" -ane '
BEGIN{open $pf, "<patterns.txt"; %P=map{chomp;$_=>1}<$pf>}
exists $P{$F[4]} and print ' file2
This is thanks to perl's -F
operator that tells perl to auto-split into columns
based on the separator (\t
in this case).
Note that since arrays in perl
start from 0
, $F[4]
is the 5th field.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With