My job involves a lot of sorting fields from very large files. I usually do this with the sort
command in bash. Unfortunately, when I start a sort I am never really sure how long it is going to take. Should I wait a second for the results to appear, or should I start working on something else while it runs?
Is there any possible way to get an idea of how far along a sort has progressed or how fast it is working?
$ cut -d , -f 3 VERY_BIG_FILE | sort -du > output
No, GNU sort
does not do progress reporting.
However, if are you using sort
just to remove duplicates, and you don't actually care about the ordering, then there's a more scalable way of doing that:
awk '! a[$0]++'
This writes out the first occurrence of a line as soon as it's been seen, which can give you an idea of the progress.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With