Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What makes the gcc std::list sort implementation so fast?

I have a linked list implementation and I'm experimenting with both Mergesort and QuickSort algorithms.

What I don't understand is why the sort operation in std::list is so fast. Looking at the std::list under linux and it appears to be linked list as well, not an array based list.

The merge sort I tried almost identical to Dave Gamble's version here: Merge Sort a Linked List

Also, I thought I'd try a simple quicksort based on this code: http://www.flipcode.com/archives/Quick_Sort_On_Linked_List.shtml

Suprisingly, to sort 10 million random numbers using std::list and sort was around 10 times quicker than either of those others.

And for those that are asking, yes I need to use my own list class for this project.

like image 434
hookenz Avatar asked Jul 18 '11 04:07

hookenz


1 Answers

I've been taking a look at the interesting GLibC implementation for list::sort (source code) and it doesn't seem to implement a traditional merge sort algorithm (at least not one I've ever seen before).

Basically what it does is:

  1. Creates a series of buckets (64 total).
  2. Removes the first element of the list to sort and merges it with the first (i=0th) bucket.
  3. If, before the merge, the ith bucket is not empty, merge the ith bucket with the i+1th bucket.
  4. Repeat step 3 until we merge with an empty bucket.
  5. Repeat step 2 and 3 until the list to sort is empty.
  6. Merge all the remaining non-empty buckets together starting from smallest to largest.

Small note: merging a bucket X with a bucket Y will remove all the elements from bucket X and add them to bucket Y while keeping everything sorted. Also note that the number of elements within a bucket is either 0 or 2^i.

Now why is this faster then a traditionnal merge sort? Well I can't say for sure but here are a few things that comes to mind:

  • It never traverses the list to find a mid-point which also makes the algorithm more cache friendly.
  • Because the earlier buckets are small and used more frequently, the calls to merge trash the cache less frequently.
  • The compiler is able to optimize this implementation better. Would need to compare the generated assembly to be sure about this.

I'm pretty sure the folks who implemented this algorithm tested it thoroughly so if you want a definitive answer you'll probably have to ask on the GCC mailing list.

like image 147
Ze Blob Avatar answered Oct 13 '22 01:10

Ze Blob