I have a file (A.txt) with 4 columns of numbers and another file with 3 columns of numbers (B.txt). I need to solve the following problems:
Find all lines in A.txt whose 3rd column has a number that appears any where in the 3rd column of B.txt.
Assume that I have many files like A.txt in a directory and I need to run this for every file in that directory.
How do I do this?
Re: You should never see someone using grep and awk together... I've got a series of syslog files in /var/log (some compressed). I need to match against a string voltage as a flag that further processing is required, but this string isn't always in the same field.
This collection of sed and grep use cases might help you better understand how these commands can be used in Linux. Tools like sed (stream editor) and grep (global regular expression print) are powerful ways to save time and make your work faster.
Combining the Twoawk and sed are both incredibly powerful when combined. You can do this by using Unix pipes. Those are the "|" bits between commands.
Grep command in used for finding particular patterns in files and outputs all the result containing the search pattern. Awk on the other hand is also used for searching a file for certain patterns but goes ahead to perform a certain task on pattern match.
You should never see someone using grep
and awk
together because whatever grep
can do, you can also do in awk
:
grep "foo" file.txt | awk '{print $1}'
awk '/foo/ {print $1}' file.txt
I had to get that off my chest. Now to your problem...
Awk is a programming language that assumes a single loop through all the lines in a set of files. And, you don't want to do this. Instead, you want to treat B.txt
as a special file and loop though your other files. That normally calls for something like Python or Perl. (Older versions of BASH didn't handle hashed key arrays, so these versions of BASH won't work.) However, slitvinov looks like he found an answer.
Here's a Perl solution anyway:
use strict;
use warnings;
use feature qw(say);
use autodie;
my $b_file = shift;
open my $b_fh, "<", $b_file;
#
# This tracks the values in "B"
#
my %valid_lines;
while ( my $line = <$b_file> ) {
chomp $line;
my @array = split /\s+/, $line;
$valid_lines{$array[2]} = 1; #Third column
}
close $b_file;
#
# This handles the rest of the files
#
while ( my $line = <> ) { # The rest of the files
chomp $line;
my @array = split /\s+/, $line;
next unless exists $valid_lines{$array[2]}; # Next unless field #3 was in b.txt too
say $line;
}
Here is an example. Create the following files and run
awk -f c.awk B.txt A*.txt
c.awk
FNR==NR {
s[$3]
next
}
$3 in s {
print FILENAME, $0
}
A1.txt
1 2 3
1 2 6
1 2 5
A2.txt
1 2 3
1 2 6
1 2 5
B.txt
1 2 3
1 2 5
2 1 8
The output should be:
A1.txt 1 2 3
A1.txt 1 2 5
A2.txt 1 2 3
A2.txt 1 2 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With