Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best continuously sorting algorithm?

I have a set of double-precision data and I need their list to be always sorted. What is the best algorithm to sort the data as it is being added?

As best I mean least Big-O in data count, Small-O in data count (worst case scenario), and least Small-O in the space needed, in that order if possible.

The set size is really variable, from a small number (30) to lots of data (+10M).

like image 209
Wilhelm Avatar asked Jul 19 '09 18:07

Wilhelm


People also ask

What is the best sorting algorithm of all time?

Quicksort. Quicksort is one of the most efficient sorting algorithms, and this makes of it one of the most used as well. The first thing to do is to select a pivot number, this number will separate the data, on its left are the numbers smaller than it and the greater numbers on the right.

Which sorting algorithm has the best constant factor?

That's because the constant factor hidden in the big-Θ notation for quicksort is quite good. In practice, quicksort outperforms merge sort, and it significantly outperforms selection sort and insertion sort.

Which algorithm is fastest for sorting?

But since it has the upper hand in the average cases for most inputs, Quicksort is generally considered the “fastest” sorting algorithm.


2 Answers

Building a self-balancing binary tree like a red-black tree or AVL tree will allow for Θ(lg n) insertion and removal, and Θ(n) retrieval of all elements in sorted order (by doing a depth-first traversal), with Θ(n) memory usage. The implementation is somewhat complex, but they're efficient, and most languages will have library implementations, so they're a good first choice in most cases.

Additionally, retreiving the i-th element can be done by annotating each edge (or, equivalently, node) in the tree with the total number of nodes below it. Then one can find the i-th element in Θ(lg n) time and Θ(1) space with something like:

node *find_index(node *root, int i) {
  while (node) {
    if (i == root->left_count)
      return root;
    else if (i < root->left_count)
      root = root->left;
    else {
      i -= root->left_count + 1;
      root = root->right;
    }
  }
  return NULL; // i > number of nodes
}

An implementation that supports this can be found in debian's libavl; unfortunately, the maintainer's site seems down, but it can be retrieved from debian's servers.

like image 86
bdonlan Avatar answered Oct 07 '22 09:10

bdonlan


The structure that is used for indexes of database programs is a B+ Tree. It is a balanced bucketed n-ary tree.

From Wikipedia:

For a b-order B+ tree with h levels of index:

  • The maximum number of records stored is n = b^h
  • The minimum number of keys is 2(b/2)^(h−1)
  • The space required to store the tree is O(n)
  • Inserting a record requires O(log-b(n)) operations in the worst case
  • Finding a record requires O(log-b(n)) operations in the worst case
  • Removing a (previously located) record requires O(log-b(n)) operations in the worst case
  • Performing a range query with k elements occurring within the range requires O(log-b(n+k)) operations in the worst case.

I use this in my program. You can add your data to the structure as it comes and you can always traverse it in order, front to back or back to front, or search quickly for any value. If you don't find the value, you will have the insertion point where you can add the value.

You can optimize the structure for your program by playing around with b, the size of the buckets.

An interesting presentation about B+ trees: Tree-Structured Indexes

You can get the entire code in C++.


Edit: Now I see your comment that your requirement to know the "i-th sorted element in the set" is an important one. All of a sudden, that makes many data structures less than optimal.

You are probably best off with a SortedList or even better, a SortedDictionary. See the article: Squeezing more performance from SortedList. Both structures have a GetKey function that will return the i-th element.

like image 29
lkessler Avatar answered Oct 07 '22 08:10

lkessler