grep -B emulation with ring buffer / awk

Question

I need to extract a line above my search string (say, 19 lines above). Normally, I would just go with

grep -B 19 $search_string $file | ...further processing

However the script should also work with on Solaris, where grep doesn't provide the -B option. Often, I can get away with awk '/begin/,/end/' to print a bunch of lines, if I know the preceding lines. In this particular situation, this is not possible. I tried the following:

1) Ring buffer solution.

#!/bin/bash
g_a_buffer=( 0 )
g_i_buffer_index=1
while read line
        do
        g_a_buffer[$((g_i_buffer_index % 20))]=$line
        echo $line|grep $search_string > /dev/null
        [ $? -eq 0 ] && echo ${g_a_buffer[$(( (g_i_buffer_index + 2) % 20))]}
        let "g_i_buffer_index += 1"
        done < $file_name

This is extremely slow. For ~40k lines it takes 1m37s (against 0.005s for grep)

2) Awk solution. I have to say outright that I am an extrem beginner in awk, rarely going beyond awk '{print $1}'. The below line doesn't work, but gives you an idea of what I am trying to achieve:

awk '/mySearchString/ {print NR-19}' filename.txt

0.118s for execution, the speed is good! But all I get is a line number - 19. What I need is a printout of the line located at (line - 19). After some googling I still couldn't find an answer. I admit this must be an extremely basic problem, but I seem to have hit a wall here.

All I found so far is how to print a previous line with awk (which is a sort of 1 line buffer), or massive implementations with ring buffer but in awk. Is there a more elegant way to do this?

Thanks for help!

Theo Spears · Accepted Answer

Here is a solution which requires two passes through the file so is not optimal, but may well perform reasonably in practice. (Tested on GNU awk, but no obvious reason why it would not work on Solaris).

awk "$(awk '/mySearchString/ { print "NR==" NR-19 }' myInputFile.txt)" myInputFile.txt

As this requires two passes, if you are piping the input from elsewhere you will need to store it in a temporary file somewhere.

Alternatively if you know that your search string will appear at most once in the file (or at least you only care about the first occurrence), you could combine awk with head and tail to extract the line:

awk 'NR==1,/mySearchString/' | tail -n 19 | head -n 1

I don't have a suitable text file handy to benchmark this, but I would expect it to be a fair amount better than your ring buffer solution.

derobert · Answer

You can probably use grep -n (which should be there, since -n is specified by POSIX) to get the line number of each match.

file="foo"
for line in $(grep -n "pattern" "$file" | cut -d: -f1); do
  end=`expr $line + 1`
  head -n $end "$file" | tail -n 3
done

That's -B 1, but it sounds like you just want n-19, so you could do:

  target=`expr $line - 19`
  head -n $target "$file" | tail -n 1

Won't be as fast as grep, and I didn't handle possible overlaps in the -B 1 case (will output lines twice), but should work. Optimization could be done with grep -b (for byte offset) if you have that.

grep -B emulation with ring buffer / awk

Tags:

arrays

grep

bash

awk

DSec

2 Answers

Theo Spears

derobert

Recent Activity

Donate For Us

grep -B emulation with ring buffer / awk

Tags:

arrays

grep

bash

awk

DSec

2 Answers

Theo Spears

derobert

Related questions

Recent Activity

Donate For Us