Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting an almost sorted array (elements misplaced by no more than k)

People also ask

Which sorting algorithm is best for almost sorted array?

​Many sorting algorithms are available, but the one which is best suited for the almost sorted array is the insertion sort.

What is an almost sorted array?

You're given an array that is almost sorted, in that each of the N elements may be misplaced by no more than k positions from the correct sorted order.

Which sorting technique will you use to sort an already sorted array with only 2 middle numbers unsorted should be efficient?

We can use Insertion Sort to sort the elements efficiently.

How do you make an array almost sorted?

The "almost sorted" array is generated by starting with an int array of size n containing the sequence 0, 1, 2, ..., n -1 and perturbing it slightly. To perturb the array, pick at random 10 pairs of indices (in the range 0 through n -1) and for each pair of indices ( i , j ), swap the elements in slots i and j .


As Bob Sedgewick showed in his dissertation work (and follow-ons), insertion sort absolutely crushes the "almost-sorted array". In this case your asymptotics look good but if k < 12 I bet insertion sort wins every time. I don't know that there's a good explanation for why insertion sort does so well, but the place to look would be in one of Sedgewick's textbooks entitled Algorithms (he has done many editions for different languages).

  • I have no idea whether O(N log k) is optimal, but more to the point, I don't really care—if k is small, it's the constant factors that matter, and if k is large, you may as well just sort the array.

  • Insertion sort will nail this problem without re-sorting the same elements.

Big-O notation is all very well for algorithm class, but in the real world, constants matter. It's all too easy to lose sight of this. (And I say this as a professor who has taught Big-O notation!)


If using only the comparison model, O(n log k) is optimal. Consider the case when k = n.

To answer your other question, yes it is possible to do this without sorting, by using heaps.

Use a min-heap of 2k elements. Insert 2k elements first, then remove min, insert next element etc.

This guarantees O(n log k) time and O(k) space and heaps usually have small enough hidden constants.


It was already pointed out that one of the asymptotically optimal solutions uses a min heap and I just wanted to provide code in Java:

public void sortNearlySorted(int[] nums, int k) {
  PriorityQueue<Integer> minHeap = new PriorityQueue<>();
  for (int i = 0; i < k; i++) {
    minHeap.add(nums[i]);
  }

  for (int i = 0; i < nums.length; i++) {
    if (i + k < nums.length) {
      minHeap.add(nums[i + k]);
    }
    nums[i] = minHeap.remove();
  }
}

Since k is apparently supposed to be pretty small, an insertion sort is probably the most obvious and generally accepted algorithm.

In an insertion sort on random elements, you have to scan through N elements, and you have to move each one an average of N/2 positions, giving ~N*N/2 total operations. The "/2" constant is ignored in a big-O (or similar) characterization, giving O(N2) complexity.

In the case you're proposing, the expected number of operations is ~N*K/2 -- but since k is a constant, the whole k/2 term is ignored in a big-O characterization, so the overall complexity is O(N).


Your solution is a good one if k is large enough. There is no better solution in terms of time complexity; each element might be out of place by k places, which means you need to learn log2 k bits of information to place it correctly, which means you need to make log2 k comparisons at least--so it's got to be a complexity of at least O(N log k).

However, as others have pointed out, if k is small, the constant terms are going to kill you. Use something that's very fast per operation, like insertion sort, in that case.

If you really wanted to be optimal, you'd implement both methods, and switch from one to the other depending on k.