Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort strings with a specific letter order with Perl

Tags:

sorting

perl

cpan

I'm trying to sort lists of names with Perl with a specific letter order to perform some special features.
The sorting would be working the same way as sort { $a cmp $b } but with a different succession of letters.
For example, ordering with the arbitrary character order "abdrtwsuiopqe987654" ...

I tried to deal with sort { $a myFunction $b } but I'm newbie with Perl and I don't see how to organize correctly myFunction to get what I want.

  • Is there a specific function (a package) which provide this functionnality?
  • Do you have an example of a custom sorting function dealing with strings ?
  • Do you know how (or in which source file) is the cmp function implemented with Perl to see how it works ?
like image 453
JeanJouX Avatar asked Apr 13 '15 18:04

JeanJouX


People also ask

How do you sort a string by alphabet?

Using the toCharArray() method Get the required string. Convert the given string to a character array using the toCharArray() method. Sort the obtained array using the sort() method of the Arrays class. Convert the sorted array to String by passing it to the constructor of the String array.

How do I sort a string in Perl?

Sorting in Perl can be done with the use of a pre-defined function 'sort'. This function uses a quicksort algorithm to sort the array passed to it. Sorting of an array that contains strings in the mixed form i.e. alphanumeric strings can be done in various ways with the use of sort() function.

How do I sort in ascending order in Perl?

Perl has two operators that behave this way: <=> for sorting numbers in ascending numeric order, and cmp for sorting strings in ascending alphabetic order. By default, sort uses cmp -style comparisons.


2 Answers

The following is probably the fastest[1]:

sub my_compare($$) {
    $_[0] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r
       cmp
    $_[1] =~ tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r
}

my @sorted = sort my_compare @unsorted;

Or if you want something more dynamic, the following might be the fastest[2]:

my @syms = split //, 'abdrtwsuiopqe987654';
my @map; $map[ord($syms[$_])] = $_ for 0..$#syms;

sub my_compare($$) {
    (pack 'C*', map $map[ord($_)], unpack 'C*', $_[0])
       cmp
    (pack 'C*', map $map[ord($_)], unpack 'C*', $_[1])
}

my @sorted = sort my_compare @unsorted;

We could compare character by character, but that will be far slower.

use List::Util qw( min );

my @syms = split //, 'abdrtwsuiopqe987654';
my @map; $map[ord($syms[$_])] = $_ for 0..$#syms;

sub my_compare($$) {
    my $l0 = length($_[0]);
    my $l1 = length($_[1]);
    for (0..min($l0, $l1)) {
       my $ch0 = $map[ord(substr($_[0], $_, 1))];
       my $ch1 = $map[ord(substr($_[1], $_, 1))];
       return -1 if $ch0 < $ch1;
       return +1 if $ch0 > $ch1;
    }

    return -1 if $l0 < $l1;
    return +1 if $l0 > $l1;
    return 0;
}

my @sorted = sort my_compare @unsorted;

  1. Technically, it can be made faster using GRT.

     my @sorted =
        map /\0(.*)/s,
        sort
        map { tr{abdrtwsuiopqe987654}{abcdefghijklmnopqrs}r . "\0" . $_ }
        @unsorted;
    
  2. Technically, it can be made faster using GRT.

     my @sorted =
        map /\0(.*)/s,
        sort
        map { ( pack 'C*', map $map[ord($_)], unpack 'C*', $_ ) . "\0" . $_ }
        @unsorted;
    

cmp is implemented by the scmp operator.

$ perl -MO=Concise,-exec -e'$x cmp $y'
1  <0> enter
2  <;> nextstate(main 1 -e:1) v:{
3  <#> gvsv[*x] s
4  <#> gvsv[*y] s
5  <2> scmp[t3] vK/2
6  <@> leave[1 ref] vKP/REFC

The scmp operator is implemented by the pp_scmp function in pp.c, which is really just a wrapper for sv_cmp_flags in sv.c when use locale; isn't in effect. sv_cmp_flags either uses C library function memcmp or a UTF-8 aware version (depending on the type of scalar).

like image 105
ikegami Avatar answered Oct 08 '22 14:10

ikegami


use Sort::Key qw(keysort);
my @sorted = keysort { tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/r } @data;

Or in older perls not supporting the r flag in tr/.../.../r

my @sorted = keysort { my $key = $_;
                       $key =~ tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/;
                       $key } @data;

You can also create an specialized sort subroutine for that kind of data as follows:

use Sort::Key::Maker 'my_special_sort',
                     sub { tr/abdrtwsuiopqe987654/abcdefghijklmnopqrs/r },
                     qw(string);

my @sorted = my_special_sort @data;
my @sorted2 = my_special_sort @data2;
like image 26
salva Avatar answered Oct 08 '22 14:10

salva