Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to grep multiple terms and output in the order of the search?

I have tried a few things, but don't seem to be making any progress - I have a text file with some lines of data, and I want some lines of data from that file. Each line has a unique identifier which I can grep.

If I use

grep 'name1\|name2\|name3\|name4' file.txt > newfile.txt

it does the job and greps the desired lines I want, however, I want the lines in the order in which I specified - from this example I want the name1 lines first, then name2 lines, then name3 lines and finally the name4 lines.

However, say for example in my original file the order of the lines were the name2, followed by the name4, followed by name3, followed by name1, the output file also seems to have the lines in this order.

Is there a way to order the grep easily?

The ids are block-sorted, so all lines with name1 for example occur next to each other.

Thanks for any advice!

like image 728
user1637359 Avatar asked Mar 23 '23 13:03

user1637359


2 Answers

Use Loop to Read Words from File

Given a file of words to grep for such as the following:

root
lp
syslog
nobody

you can use a read loop to repeatedly grep for fixed strings in another file. For example, using Bash shell's default REPLY variable and a word file stored in the /tmp directory, this will work:

while read; do
    grep --fixed-strings "$REPLY" /etc/passwd
done < /tmp/words

Notes

  1. The posted example won't prevent multiple matches, but it will ensure that the matches are made in the order defined in /tmp/words.
  2. The example uses GNU grep and fixed strings for performance reasons. Your mileage may vary with other greps and with regular expressions.
like image 124
Todd A. Jacobs Avatar answered Apr 05 '23 16:04

Todd A. Jacobs


You can use an Awk array.

awk 'BEGIN { k[1]="name1"; k[2]="name2"; k[3]="name3" }
{ for (i=1; i<4; ++i) if ($0 ~ k[i]) m[i]=(m[i]?m[i] RS:"") $0 }
END { for(i=1; i<4; ++i) if (m[i]) print m[i] }' file

This will produce duplicates if a line matches multiple expressions. It could be optimized somewhat if you need it to be fast; just ask.

Or in Perl:

perl -ne 'BEGIN { @k = qw( name1 name2 name3 name4 );
    $k = join("", "(", join("|", @k), ")");
    $r = qr($k); }
  if(m/$r/) { push @{$m{$1}}, $_ }
  END { for $i (@k) { if ($m{$i}) {
    print join("", @{$m{$i}}); } } }' file

This is probably somewhat more efficient than the equivalent Awk script. It will only find one match per line, so it is not exactly equivalent.

like image 29
tripleee Avatar answered Apr 05 '23 15:04

tripleee