Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort by line length, then reverse alphabetically

I have a large (600 odd) set of search and replace terms that I need to run as a sed script over some files. The problem is that the search terms are NOT orthogonal... but I think I can get away with it by sorting by line length (i.e. pull out the longest matches first, and then alphabetically within each length. So given an unsort set of:

aaba
aa
ab
abba
bab
aba

what I want is a sorted set such as:

abba
aaba
bab
aba
ab
aa

Is there a way of doing it by say prepending the line lenght and sorting by a field?

For bonus marks :-) !!! The search and replace is actually simply a case of replacing term with _term_ and the sed code I was going to use was s/term/_term_/g How would I write the regex to avoid replacing terms already within _ pairs?

like image 722
Dycey Avatar asked Dec 08 '22 05:12

Dycey


2 Answers

You can do this in a one-line Perl script:

perl -e 'print sort { length $b<=>length $a || $b cmp $a } <>' input
like image 150
mob Avatar answered Dec 11 '22 12:12

mob


You could compact it all into one regexp:

$ sed -e 's/\(aaba\|aa\|abba\)/_\1_/g'
testing words aa, aaba, abba.
testing words _aa_, _aaba_, _abba_.   

If I understand your question correctly, this will solve all your problems: No "double replacement" and always matching the longest word.

like image 20
Johannes Hoff Avatar answered Dec 11 '22 11:12

Johannes Hoff