How to sort by line length, then reverse alphabetically

Question

I have a large (600 odd) set of search and replace terms that I need to run as a sed script over some files. The problem is that the search terms are NOT orthogonal... but I think I can get away with it by sorting by line length (i.e. pull out the longest matches first, and then alphabetically within each length. So given an unsort set of:

aaba
aa
ab
abba
bab
aba

what I want is a sorted set such as:

abba
aaba
bab
aba
ab
aa

Is there a way of doing it by say prepending the line lenght and sorting by a field?

For bonus marks :-) !!! The search and replace is actually simply a case of replacing term with _term_ and the sed code I was going to use was s/term/_term_/g How would I write the regex to avoid replacing terms already within _ pairs?

mob · Accepted Answer

You can do this in a one-line Perl script:

perl -e 'print sort { length $b<=>length $a || $b cmp $a } <>' input

Johannes Hoff · Answer

You could compact it all into one regexp:

$ sed -e 's/$aaba\|aa\|abba$/_\1_/g'
testing words aa, aaba, abba.
testing words _aa_, _aaba_, _abba_.

If I understand your question correctly, this will solve all your problems: No "double replacement" and always matching the longest word.

How to sort by line length, then reverse alphabetically

Tags:

regex

bash

sorting

sed

Dycey

2 Answers

mob

Johannes Hoff

Recent Activity

Donate For Us

How to sort by line length, then reverse alphabetically

Tags:

regex

bash

sorting

sed

Dycey

2 Answers

mob

Johannes Hoff

Related questions

Recent Activity

Donate For Us