In case you are given:
Which sorting algorithm would you choose? I am debating between insertion and quicksort. I know that the best case for insertion sort is O(n), but the worst case is O(n2). Also, considering the fact the memory is limited, I would divide the data in two parts, and on each of them do quicksort, then merge everything together. It will take O(n) time to split the data, O(n) to merge the data, and O(n log n) to sort the data using quicksort, for a net runtime of O(n log n).
Does anyone have any suggestions on how to improve this?
Insertion sort is the clear winner on this initial condition. Bubble sort is fast, but insertion sort has lower overhead. Shell sort is fast because it is based on insertion sort. Merge sort, heap sort, and quick sort do not adapt to nearly sorted data.
Some basic algorithms like Insertion or Bubble sort require no additional memory and can sort the data in place. On the other hand, more efficient algorithms like Quick sort and Merge sort require O(logN) and O(N) time complexity respectively (meaning that extra space is required to complete the sorting).
Merge Sort Merge Sort is a divide-and-conquer algorithm. In each iteration, merge sort divides the input array into two equal subarrays, calls itself recursively for the two subarrays, and finally merges the two sorted halves.
If the input array is already sorted, insertion sort performs as few as n-1 comparisons, thus making insertion sort more efficient when given sorted or "nearly-sorted" arrays.
Your mergesort-like approach seems very reasonable. More generally, this type of sorting algorithm is called an external sorting algorithm. These algorithms often work as you've described - load some subset of the data into memory, sort it, then write it back out to disk. At the end, use a merging algorithm to merge everything back together. The choice of how much to load in and what sorting algorithm to use are usually the dominant concerns. I'll focus mostly on the sorting algorithm choice.
Your concerns about quicksort's worst-case behavior are generally speaking nothing to worry about, since if you choose the pivot randomly the probability that you get a really bad runtime is low. The random pivot strategy also works well even if the data is already sorted, as it has no worst-case inputs (unless someone knows your random number generator and the seed). You could also use a quicksort variant like introsort, which doesn't have the worst-case behavior, as your sorting algorithm in order to avoid this worst-case.
That said, since you know that the data is already partially sorted, you may want to look into an adaptive sorting algorithm for your sorting step. You've mentioned insertion sort for this, but there are much better adaptive algorithms out there. If memory is scarce (as you've described), you might want to try looking into the smoothsort algorithm, which has best-case runtime O(n), worst-case runtime O(n log n), and uses only O(1) memory. It's not as adaptive as some other algorithms (like Python's timsort, natural mergesort, or Cartesian tree sort), but it has lower memory usage. It's also not as fast as a good quicksort, but if the data really is mostly sorted it can do pretty well.
Hope this helps!
On the face of it, I would divide & conquer with quicksort and call it a day. Many algorithms problems are over-thought.
Now, if you do have have test data to work with and really want a grasp on that, stick an abstract class in the middle and benchmark. We can hem and haw over things all day, but knowing that the data is already partially sorted, you'll have to test. Sorted data brings about worst-case performance in most quicksort implementations.
Consider that there are many sorting algorithms and some are suited better to sorted sets. Also, when you know a set is sorted, you can merge it with another set in n time. Thus, identifying chunks of sorted data first might save you a lot of time when you compare adding a single (n) pass and greatly reducing the chance of quicksort going to (n2) time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With