I need to extract a line above my search string (say, 19 lines above). Normally, I would just go with
grep -B 19 $search_string $file | ...further processing
However the script should also work with on Solaris, where grep doesn't provide the -B option. Often, I can get away with awk '/begin/,/end/' to print a bunch of lines, if I know the preceding lines. In this particular situation, this is not possible. I tried the following:
1) Ring buffer solution.
#!/bin/bash
g_a_buffer=( 0 )
g_i_buffer_index=1
while read line
do
g_a_buffer[$((g_i_buffer_index % 20))]=$line
echo $line|grep $search_string > /dev/null
[ $? -eq 0 ] && echo ${g_a_buffer[$(( (g_i_buffer_index + 2) % 20))]}
let "g_i_buffer_index += 1"
done < $file_name
This is extremely slow. For ~40k lines it takes 1m37s (against 0.005s for grep)
2) Awk solution. I have to say outright that I am an extrem beginner in awk, rarely going beyond awk '{print $1}'. The below line doesn't work, but gives you an idea of what I am trying to achieve:
awk '/mySearchString/ {print NR-19}' filename.txt
0.118s for execution, the speed is good! But all I get is a line number - 19. What I need is a printout of the line located at (line - 19). After some googling I still couldn't find an answer. I admit this must be an extremely basic problem, but I seem to have hit a wall here.
All I found so far is how to print a previous line with awk (which is a sort of 1 line buffer), or massive implementations with ring buffer but in awk. Is there a more elegant way to do this?
Thanks for help!
Here is a solution which requires two passes through the file so is not optimal, but may well perform reasonably in practice. (Tested on GNU awk, but no obvious reason why it would not work on Solaris).
awk "$(awk '/mySearchString/ { print "NR==" NR-19 }' myInputFile.txt)" myInputFile.txt
As this requires two passes, if you are piping the input from elsewhere you will need to store it in a temporary file somewhere.
Alternatively if you know that your search string will appear at most once in the file (or at least you only care about the first occurrence), you could combine awk with head and tail to extract the line:
awk 'NR==1,/mySearchString/' | tail -n 19 | head -n 1
I don't have a suitable text file handy to benchmark this, but I would expect it to be a fair amount better than your ring buffer solution.
You can probably use grep -n (which should be there, since -n is specified by POSIX) to get the line number of each match.
file="foo"
for line in $(grep -n "pattern" "$file" | cut -d: -f1); do
end=`expr $line + 1`
head -n $end "$file" | tail -n 3
done
That's -B 1, but it sounds like you just want n-19, so you could do:
target=`expr $line - 19`
head -n $target "$file" | tail -n 1
Won't be as fast as grep, and I didn't handle possible overlaps in the -B 1 case (will output lines twice), but should work. Optimization could be done with grep -b (for byte offset) if you have that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With