Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort strings using two substring equality conditions?

Tags:

sorting

perl

I have a list of strings with the following format:

('group1-1', 'group1-2','group1-9', 'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' )

I need them to be sorted as below:

group wise first and then number wise.

('group1-1', 'group1-2','group1-9','group1-10', 'group2-1','group2-2', 'group2-9', 'group2-10' )

I've written following code, but it's not working as expected: a comparator that sorts based on the group and if the groups match, it sorts based on the number.

my @list = ('group1-1', 'group1-2','group1-9', 
'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' );
@list = sort compare @list;
for (@list){
    print($_."\n");
}

sub compare{
    my $first_group, $first_num = get_details($a);
    my $second_group, $second_num = get_details($b);
    if($first_group < $second_group){
      return -1;
   } elsif($first_group == $second_group){
      if ( $first_num < $second_num) {
         return -1;
      } elsif ( $first_num == $second_num ) {
         return 0;
      } else {
         return 1;
      }
   } else{
      return 1;                       
   }
}

sub get_details($){
   my $str= shift;
   my $group = (split /-/, $str)[0];
   $group =~ s/\D//g;
   my $num = (split /-/, $str)[1];
   $num =~ s/\D//g;
   return $group, $num;
}
like image 714
tourist Avatar asked Dec 07 '22 09:12

tourist


2 Answers

You could use a Schwartzian transform:

use warnings;
use strict;

my @list = ('group1-1', 'group1-2','group1-9', 
    'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' );

@list = map  { $_->[0] }
        sort { $a->[1] cmp $b->[1] or $a->[2] <=> $b->[2] }
        map  { [$_, split /-/] }
        @list;

for (@list) {
    print($_."\n");
}

Prints:

group1-1
group1-2
group1-9
group1-10
group2-1
group2-2
group2-9
group2-10
like image 69
toolic Avatar answered Dec 11 '22 09:12

toolic


There's a little detail with the data here that can lead to a quiet bug. When you use the pre-hyphen substring for sorting (group1 etc), it has both letters and numbers so when sorted lexicographically it may be wrong for multi-digit numbers. For example

group1, group2, group10

is sort-ed (by default cmp) into

group1
group10
group2

What is wrong, I presume.

So inside sorting we need to break the groupN into group and N, and sort numerically by N.

use warnings;
use strict;
use feature 'say';

my @list = ('group1-1', 'group1-2','group1-9',
    'group2-1','group2-2', 'group2-9',
    'group1-10', 'group2-10',
    'group10-2', 'group10-1'                    # Added some 'group10' data
);


# Break input string into:  group N - N   (and sort by first then second number)

@list = 
    map  { $_->[0] }
    sort { $a->[2] <=> $b->[2] or $a->[4] <=> $b->[4] }
    map  { [ $_, /[0-9]+|[a-zA-Z]+|\-/g ] } 
    @list;

say for @list;

The regex extracts both numbers and words from the string, for sorting. But if that lone substring is always indeed the same (group) then we only ever sort by numbers and can use /[0-9]+/g, and compare numerically arrayref elements at indices 1 and 2.

Prints

group1-1
group1-2
group1-9
group1-10
group2-1
group2-2
group2-9
group2-10
group10-1
group10-2
like image 26
zdim Avatar answered Dec 11 '22 08:12

zdim