Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep to multiple output files

Tags:

grep

bash

awk

I have one huge file (over 6GB) and about 1000 patterns. I want extract lines matching each of the pattern to separate file. For example my patterns are:

1
2

my file:

a|1
b|2
c|3
d|123

As a output I would like to have 2 files:

1:

a|1
d|123

2:

b|2
d|123

I can do it by greping file multiple times, but it is inefficient for 1000 patterns and huge file. I also tried something like this:

grep -f pattern_file huge_file

but it will make only 1 output file. I can't sort my huge file - it takes to much time. Maybe AWK will make it?

like image 226
mefju Avatar asked Dec 22 '22 05:12

mefju


1 Answers

awk -F\| 'NR == FNR {
  patt[$0]; next
  }
{
  for (p in patt)
    if ($2 ~ p) print > p
  }' patterns huge_file

With some awk implementations you may hit the max number of open files limit. Let me know if that's the case so I can post an alternative solution.

P.S.: This version will keep only one file open at a time:

awk -F\| 'NR == FNR {
  patt[$0]; next
  }
{
  for (p in patt) {
    if ($2 ~ p) print >> p
    close(p)
    }
  }' patterns huge_file
like image 83
Dimitre Radoulov Avatar answered Jan 01 '23 11:01

Dimitre Radoulov