cat testfile | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-
Or, to do your original (perhaps unintentional) sub-sorting of any equal-length lines:
cat testfile | awk '{ print length, $0 }' | sort -n | cut -d" " -f2-
In both cases, we have solved your stated problem by moving away from awk for your final cut.
The question did not specify whether or not further sorting was wanted for lines of matching length. I've assumed that this is unwanted and suggested the use of -s
(--stable
) to prevent such lines being sorted against each other, and keep them in the relative order in which they occur in the input.
(Those who want more control of sorting these ties might look at sort's --key
option.)
It is interesting to note the difference between:
echo "hello awk world" | awk '{print}'
echo "hello awk world" | awk '{$1="hello"; print}'
They yield respectively
hello awk world
hello awk world
The relevant section of (gawk's) manual only mentions as an aside that awk is going to rebuild the whole of $0 (based on the separator, etc) when you change one field. I guess it's not crazy behaviour. It has this:
"Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:"
$1 = $1 # force record to be reconstituted
print $0 # or whatever else with $0
"This forces awk to rebuild the record."
aa A line with MORE spaces
bb The very longest line in the file
ccb
9 dd equal len. Orig pos = 1
500 dd equal len. Orig pos = 2
ccz
cca
ee A line with some spaces
1 dd equal len. Orig pos = 3
ff
5 dd equal len. Orig pos = 4
g
The AWK solution from neillb is great if you really want to use awk
and it explains why it's a hassle there, but if what you want is to get the job done quickly and don't care what you do it in, one solution is to use Perl's sort()
function with a custom caparison routine to iterate over the input lines. Here is a one liner:
perl -e 'print sort { length($a) <=> length($b) } <>'
You can put this in your pipeline wherever you need it, either receiving STDIN (from cat
or a shell redirect) or just give the filename to perl as another argument and let it open the file.
In my case I needed the longest lines first, so I swapped out $a
and $b
in the comparison.
Try this command instead:
awk '{print length, $0}' your-file | sort -n | cut -d " " -f2-
Below are the results of a benchmark across solutions from other answers to this question.
perl
solution took 11.2 secondsperl
solution took 11.6 secondsawk
solution #1 took 20 secondsawk
solution #2 took 23 secondsawk
solution took 24 secondsawk
solution took 25 secondsbash
solution takes 400x longer than the awk
solutions (using a truncated test case of 100000 lines). It works fine, just takes forever.perl
solutionperl -ne 'push @a, $_; END{ print sort { length $a <=> length $b } @a }' file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With