I have scripts for big log files. I can check all line and do something with tail
and awk
.
Tail:
tail -n +$startline $LOG
Awk:
awk 'NR>='"$startline"' {print}' $LOG
And checking time, tail working 6 mins 39 seconds, awk working 6 mins 42 seconds. So two commands do same thing / same time.
I don't know how to do with sed. Sed can be faster than tail and awk? Or maybe other commands.
Second question, I use $startline
and every time continue remains from the last line. For example:
I use script line this:
10:00AM -> ./script -> $startline=1 and do something -> write line number to save file(for ex. 25),
10:05AM -> ./script -> $startline=26(read save file +1) and do something -> write line number save file(55),
10:10AM -> ./script -> $startline=56(read save file +1) and do something ....
But when script is running, checking all lines and when see $startline
, doing something. And it's a little slow because of huge files.
Any suggestions for it be faster?
Script example:
lastline=$(tail -1 "line.save")
startline=$(($lastline + 1))
tail -n +$startline $LOG | while read -r
do
....
done
linecount=$(wc -l "$LOG" | awk '{print $1}')
echo $linecount >> line.save
I find awk much faster than sed . You can speed up grep if you don't need real regular expressions but only simple fixed strings (option -F). If you want to use grep, sed, awk together in pipes, then I would place the grep command first if possible.
Conclusion: Use sed for very simple text parsing. Anything beyond that, awk is better. In fact, you can ditch sed altogether and just use awk. Since their functions overlap and awk can do more, just use awk.
Where we can use both sed or tr, we will prefer to use of tr command because the tr is more faster. Of course, in many practical cases, the speed difference is too small to notice.
Awk is a compiled language. Your Awk script is compiled once and applied to every line of your file at C-like speeds. It is way faster than Python. If you learn to use Awk well, you will start doing things with data that you wouldn't have had the patience to do in an interpreted language.
tail
and head
are tools especially created for this purposes, so the intuitive idea is that their are quite optimized for it. On the other hand, awk
and sed
can perfectly do it because they are like a Swiss Army knife, but this is not supposed to be its best "skill" over the multiple others that they have.
In Efficient way to print lines from a massive file using awk, sed, or something else? there is a nice comparison on methods and head
/ tail
is seen as the best approach.
Hence, I would go for tail
+ head
.
Note also that if it is not only the last lines, but a set of them within the text, in awk
(or in sed
) you have the option to exit
after the last line you wanted. This way, you avoid the script to run the file until the last line.
So this:
awk '{if (NR>=10 && NR<20) print} NR==20 {print; exit}'
is faster than
awk 'NR>=10 && NR<=20'
If your input happens to contain more than 20 lines.
Regarding your expression:
awk 'NR>='"$startline"' {print}' $LOG
note that it is more straight forward to write:
awk -v start="$startline" 'NR>=start' $LOG
there is no need to say print
because it is implicit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With