I have about 100 million rows and 6 fields separated by a space, each field has seven-digit numbers.
I like to delete the 2nd field and can achieve it with the following
1. awk '{print $1,$3,$4,$5,$6}' input.txt
2. cut --delimiter=' ' --fields=1,3-6 input.txt
Which one is faster to have the desired output? Is there a way to time the process?
Thank you for your help.
Is there a way to time the process?
Yes. Just prepend the command time
before your code and it will return how long it took. Do it for each one.
time awk '{print $1,$3,$4,$5,$6}' input.txt
time cut --delimiter=' ' --fields=1,3-6 input.txt
With a quick bit of profiling it looks like cut
just barely wins out in this scenario. It's still quite an impressive time for awk
considering how much more capable it is over cut
.
$ time for i in {1..1000}; do cut --delimiter=' ' --fields=1,3-6 >/dev/null <<<"one two three four five six seven"; done
real 0m4.074s
user 0m0.496s
sys 0m2.799s
$ time for i in {1..1000}; do awk '{print $1,$3,$4,$5,$6}' >/dev/null <<<"one two three four five six seven"; done
real 0m4.511s
user 0m0.728s
sys 0m3.165s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With