Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest sort algorithm for 0-65535 integers?

I have to sort a number of integers, which can have values between 30.000.000 and 350.000.000. There will be between 0 and 65.535 integers, with the average count being 20.000. RAM usage is irrelevant and speed only is important.

Later on i will also have to split them into groups, with the divide always being set whenever the gap between two of these values is >65.535, which is what i need the algorithm for.

If it makes any difference, the algorithm will be used in a Perl script.

Edit: After thinking it over and reading the answers i've come to realize something: I don't actually care about the data itself. As i really only want to find the start and end values of groups with small gaps, the sorting only needs to create buckets and can discard the data.

Edit2: After some testing and trying out the answers provided, the fastest way i found was this:

my @sort = sort {$a <=> $b} @item_offsets;
my @buckets;
my $start = shift @sort;
push @buckets, [$start,$start];
for my $item ( @sort ) {
    if ( $item < $buckets[$#buckets][1]+$gap ) {
        $buckets[$#buckets][1] = $item;
    }
    else {
        push @buckets, [$item,$item];
    }
}
say $#buckets;
like image 334
Mithaldu Avatar asked Nov 12 '08 19:11

Mithaldu


4 Answers

I'd just make an array of buckets before running the algorithm, one for each group of 65536 consecutive values. The buckets will contain a min and max value of their contents, but won't store the contents themselves. After running the algorithm, do a single pass over the buckets. If there are two consecutive non-empty buckets with min(bucket2)-max(bucket1) < 65536, combine them. Combining will not happen until the algorithm finishes running. Discard any empty buckets. This algorithm is linear time.

Take note of Bucket Sort.

like image 95
Brian Avatar answered Nov 15 '22 20:11

Brian


It's unlikely that you'll be able to write a sort algorithm in Perl that will perform better than Perl's builtin sort function:

@numbers = sort {$a <=> $b} @numbers;

You can experiment with the sort pragma to see if a particular algorithm is better:

use sort '_quicksort';
use sort '_mergesort';

Since your cutpoints will vary depending on the data distribution, I think you need to sort the entire list first then loop over it to do the cutting.

my $prev  = shift @numbers;  # already sorted
my @group = [$prev];
my $i     = 0;

foreach my $n (@numbers) {
    $i++ if ($n - $prev > 65535);
    push @{$group[$i]}, $n;
    $prev = $n;
}
like image 26
Michael Carman Avatar answered Nov 15 '22 22:11

Michael Carman


I'd use a radix sort, since you need to group the output.

like image 44
FlySwat Avatar answered Nov 15 '22 21:11

FlySwat


I was just going to say radix sort, http://en.wikipedia.org/wiki/Radix_sort however that could be a bit above what you were looking to implement, Introsort is generally the accepted sorting solution for data http://en.wikipedia.org/wiki/Introsort, it's a variation of quicksort that switches to heapsort when it reaches smaller sets as it's faster on smaller sets than quicksort.

like image 29
Chris Marisic Avatar answered Nov 15 '22 21:11

Chris Marisic