Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grokking Timsort

Tags:

There's a (relatively) new sort on the block called Timsort. It's been used as Python's list.sort, and is now going to be the new Array.sort in Java 7.

There's some documentation and a tiny Wikipedia article describing the high-level properties of the sort and some low-level performance evaluations, but I was curious if anybody can provide some pseudocode to illustrate what Timsort is doing, exactly, and what are the key things that make it zippy. (Esp. with regard to the cited paper, "Optimistic Sorting and Information Theoretic Complexity.")

(See also related StackOverflow post.)

like image 534
Yang Avatar asked Nov 14 '09 02:11

Yang


People also ask

Is timsort better than quicksort?

TimSort is a highly optimized mergesort, it is stable and faster than old mergesort. when comparing with quicksort, it has two advantages: It is unbelievably fast for nearly sorted data sequence (including reverse sorted data); The worst case is still O(N*LOG(N)).

Is timsort the best sort?

Timsort is one of the best sorting algorithms in terms of complexity and stability. Unlike “bubble” or “insertion” sorting, Timsort is rather new — it was invented in 2002 by Tim Peters (and named after him).

What is the best and worst complexity of timsort?

The average case time complexity of Tim sort is O(n log n). Worst Case Complexity - It occurs when the array elements are required to be sorted in reverse order. That means suppose you have to sort the array elements in ascending order, but its elements are in descending order.

Why is timsort stable?

Timsort is a stable sorting algorithm (order of elements with same key is kept) and strives to perform balanced merges (a merge thus merges runs of similar sizes). In pursuit of balanced merges, Timsort considers three runs on the top of the stack, X, Y, Z, and maintains the invariants: |Z| > |Y| + |X|


1 Answers

Quoting the relevant portion from a now deleted blog post: Visualising Sorting Algorithms: Python's timsort

The business-end of timsort is a mergesort that operates on runs of pre-sorted elements. A minimum run length minrun is chosen to make sure the final merges are as balanced as possible - for 64 elements, minrun happens to be 32. Before the merges begin, a single pass is made through the data to detect pre-existing runs of sorted elements. Descending runs are handled by simply reversing them in place. If the resultant run length is less than minrun, it is boosted to minrun using insertion sort. On a shuffled array with no significant pre-existing runs, this process looks exactly like our guess above: pre-sorting blocks of minrun elements using insertion sort, before merging with merge sort.

[...]

  • timsort finds a descending run, and reverses the run in-place. This is done directly on the array of pointers, so seems "instant" from our vantage point.
  • The run is now boosted to length minrun using insertion sort.
  • No run is detected at the beginning of the next block, and insertion sort is used to sort the entire block. Note that the sorted elements at the bottom of this block are not treated specially - timsort doesn't detect runs that start in the middle of blocks being boosted to minrun.
  • Finally, mergesort is used to merge the runs.
like image 78
u0b34a0f6ae Avatar answered Oct 12 '22 13:10

u0b34a0f6ae