Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this quick sort cause stack overflow on nearly sorted lists and sorted lists?

I'm currently writing a quick sort algorithm in Java to sort random arrays of integers and then timing them using System.nanoTime(). The size of these arrays are powers of ten, starting from 10^3 and ending at 10^7. Furthermore, the random lists have different properties. I am sorting purely random lists, lists with some identical values (fewUnique), reversed sorted lists, sorted lists and nearly sorted lists.

The sort works. It performs a quick sort recursively on the array until it needs to sort 30 elements or less of the array, in which case it performs an insertion sort.

All was fine for 10^3 and 10^4 but once I got to 10^5 values it would only sort the random, few unique and random lists but would incur a Stack Overflow error when sorting nearly sorted and sorted lists.

I don't believe the issue lies in the way the lists are generated because the Stack Overflow occurs within the sorting algorithm (the line which is being referred to by the compiler is the first within the findPivot() method). Also, it will often sort between 1 and 6 lists before crashing. It must therefore be some way that the algorithm itself interacts with nearly sorted and sorted lists. Also, the generation for reversed lists involves invoking the code for creating a sorted list (and then reversing it).

Also, I find it unlikely that the issue is just to the algorithm having to, for some reason, invoke the partitioning off of sections of the array by recursion in a nearly sorted and sorted list significantly more times rather than the other list types, as it can sort random lists with 10^7 values, which would require far more partitioning than would a nearly sorted list with 10^5 values.

I realise it must have something to do with how these list types interact with the recursion of my quick sort, but if someone could shed light on it that would be great. I've put the code to both the quick sort algorithm in full and the random list generators below.

QUICK SORT ALGORITHM

/**
 * Performs a quick sort given the indexes of the bounds of an integer array
 * @param arr The array to be sorted
 * @param highE The index of the upper element
 * @param lowE The index of the lower element
 */
public static void quickSort(int[] arr, int highE, int lowE)
{       
    //Only perform an action if arr.length > 30, otherwise insertion sort [recursive base case])
    if (lowE + 29 < highE)
    {
        //Get the element and then value of the pivot
        int pivotE = findPivot(arr, highE, lowE);
        int pivotVal = arr[pivotE], storeE = lowE;

        //Swap the pivot and the last value.
        swapElements(arr, pivotE, highE);

        //For each element in the selection that is not the pivot, check to see if it is lower than the pivot and if so, move it to the leftmost untouched element.
        for (int i = lowE; i < highE; i++)
        {
            if (arr[i] < pivotVal)
            {
                swapElements(arr, storeE, i);

                //Increment storeE so that the element that is being switched moves to the right location
                storeE++;
            }
        }

        //Finally swap the pivot into its proper position and recrusively call quickSort on the lesser and greater portions of the array
        swapElements(arr, storeE, highE);                   
        //Lesser
        quickSort(arr, storeE - 1, lowE);
        //Greater
        quickSort(arr, highE, storeE + 1);
    }
    else
    {
        insertSort(arr, highE, lowE);
    }
}




/**
 * Finds the pivot element
 * @param arr The array to be sorted
 * @param highE The index of the top element
 * @param lowE The index of the bottom element
 * @return The index of the pivot.
 */
public static int findPivot(int[] arr, int highE, int lowE)
{
    //Finds the middle element
    int mid = (int) Math.floor(lowE + (highE - lowE) / 2);

    //Returns the value of the median of the first, middle and last elements in the array.
    if ((arr[lowE] >= arr[mid]) && (arr[lowE] >= arr[highE])) 
    {
        if (arr[mid] > arr[highE]) {return mid;}
        else {return highE;}
    }
    else if ((arr[mid] >= arr[lowE]) && (arr[mid] >= arr[highE])) 
    {
        if (arr[lowE] > arr[highE]) {return lowE;}
        else {return highE;}
    }
    else 
    {
        if (arr[lowE] > arr[mid]) {return lowE;}
    }

    return mid;
}




/**
 *Performs an insertion sort on part of an array
 * @param arr The array to be sorted.
 * @param highE The index of the top element.
 * @param lowE The index of the low element.
 */
public static void insertSort(int[] arr, int highE, int lowE)
{
    //Sorts elements lowE to i in array, with i being gradually incremented.
    for (int i = lowE + 1; i <= highE; i++)
    {
        for (int j = i; j > lowE; j--)
        {
            if (arr[j] < arr[j - 1])
            {
                swapElements(arr, j, j-1);
            }
            else {break;}
        }
    }
}

RANDOM LIST GENERATORS

/**
 * Creates a random list
 * @param arr The array to place the list inside of
 */
public static void randomList(int[] arr)
{
    //Places a random number at each element of the array

    for (int i = 0; i < arr.length; i++)
    {
        arr[i] = (int) Math.floor(Math.random() * RAND_MAX);
    }
}




/**
 * Creates a nearly sorted list of random numbers
 * @param arr the array to place the list inside of
 */
public static void nearSortList(int[] arr)
{
    //Creates a sorted list in arr
    sortList(arr);



    int swaps = (int) (Math.ceil(Math.pow((Math.log(arr.length)), 2.0)));

    //The two values to be switched each time
    int a, b;

    //Performs a number of swaps equal to swaps [log(N) ^ 2] rounded up, with numbers switched no more than ln(N) places away
    for (int i = 0; i < swaps; i++)
    {
        a = (int) Math.floor(Math.random() * arr.length);

        b = (int) (a + Math.random() * 2 * Math.log(arr.length) - Math.log(arr.length));

        //Accounts for cases in which b is either greater or smaller than the array bounds
        if (b < 0)
        {
            b = -b;
        }
        else if (b >= arr.length)
        {
            b = -1 * (arr.length - b);
        }

        swapElements(arr, a, b);
    }
}




/**
 * Creates a random list with many unique values in
 * @param arr the array to place the list inside of
 */
public static void fewUniqueList(int[] arr)
{
    int[] smallArr = new int[(int) Math.floor(Math.pow(arr.length, 9.0 / 10.0))];


    //Creates a smaller array of random values
    randomList(smallArr);



    //Fills the larger list up with values from the smaller list, ensuring aproximately N / N ^ (9/10) instances of each number in the array and ensuring, at most, there are N ^ (9/10) (rounded down) unique values in the large array
    for (int i = 0; i < arr.length; i++)
    {
        arr[i] = smallArr[(int) Math.floor(Math.random() * smallArr.length)];
    }
}




/**
 * Creates a reversed list of random numbers
 * @param arr the array to place the list inside of
 */
public static void reversedList(int[] arr)
{
    //Creates a sorted list in arr
    sortList(arr);




    //Switches each ith elements with its mirror on the other end of the array until the value of i reaches the middle of the array
    for (int i = 0; i < (int) (arr.length / 2.0); i++)
    {
        swapElements(arr, i, arr.length - 1 - i);
    }
}




/**
 * Creates a sorted list of random numbers using a merge sort
 * @param arr the array to place the list inside of
 */
public static void sortList(int[] arr)
{
    //Creates a random list in arr
    randomList(arr);

    Arrays.sort(arr);
}

EDIT: [Defunct]

EDIT 2:

I've replaced the basic recursive calls with the following code in order to only call the smallest of the two partitions at EJPs recommendation and it still is not fixing the issue.

if (storeE - 1 - lowE < highE - storeE + 1)
{
    //Lesser
    quickSort(arr, storeE - 1, lowE);
    //Greater
    quickSort(arr, highE, storeE + 1);
}
else
{
    //Greater
    quickSort(arr, highE, storeE + 1);
    //Lesser
    quickSort(arr, storeE - 1, lowE);
}

EDIT 3:

Ok, it is now evident that the recursion depth is far greater than I gave it credit for for nearly sorted and sorted lists. But now I need to work out why this is the case, and why random lists only had a depth of 28 for 10^7 values but nearly sorted and sorted lists have depths of over 3000

like image 919
Zach Beavon-Collin Avatar asked Nov 27 '13 22:11

Zach Beavon-Collin


People also ask

Why quick sort is worst for sorted array?

The worst case time complexity of a typical implementation of QuickSort is O(n2). The worst case occurs when the picked pivot is always an extreme (smallest or largest) element. This happens when input array is sorted or reverse sorted and either first or last element is picked as pivot.

Why is insertion sort the most efficient when the input list is almost in sorted order?

The reason that insertion sort is faster on sorted or nearly-sorted arrays is that when it's inserting elements into the sorted portion of the array, it barely has to move any elements at all.

What are the disadvantages of quick sort?

The primary disadvantage with quicksort is that its absolute worst case is O( n ^2). If we choose the pivot points randomly, we can guarantee that this behavior is unlikely, but we cannot provide an absolute guarantee.

Why is quick sort unstable?

QuickSort is an unstable algorithm because we do swapping of elements according to pivot's position (without considering their original positions).


2 Answers

For a random array, you could partition off massive chunks of the data.
But for a (nearly) sorted array, you'd mostly be partitioning off 1 element at a time.

So, for a sorted array, your stack size would end up being the same as the size of the array, while, for a random array, it's much more likely to be about a logarithm of that size.

So, even if the random array is much larger than a nearly sorted one, it's not surprising that the smaller one throws an exception, but the larger one doesn't.

Modifying your code

In terms of a fix, as EJP pointed out, you should do the smaller partition first to limit stack growth. But this in itself won't fix the problem as Java doesn't support tail-call optimization (well, it's optional for an implementation, as I understand that question).

A fairly simple fix here is to throw your function into a while-loop, essentially hard-coding the tail-call optimization.

To give a better idea of what I mean:

public static void quickSort(int[] arr, int highE, int lowE)
{
    while (true)
    {
        if (lowE + 29 < highE)
        {
            ...
            quickSort(arr, storeE - 1, lowE);

            // not doing this any more
            //quickSort(arr, highE, storeE + 1);

            // instead, simply set the parameters to their new values
            // highE = highE;
            lowE = storeE + 1;
        }
        else
        {
            insertSort(arr, highE, lowE);
            return;
        }
    }
}

Well, now that you have the basic idea, this would look better (functionally equivalent to the above, just more concise):

public static void quickSort(int[] arr, int highE, int lowE)
{
    while (lowE + 29 < highE)
    {
        ...
        quickSort(arr, storeE - 1, lowE);
        lowE = storeE + 1;
    }
    insertSort(arr, highE, lowE);
}

This of course doesn't actually do the smaller one first, but I'll leave that to you to figure out (seems you already have a fair idea of how to do this).

How this works

For some made up values...

Your current code does this: (an indent indicates what happens inside that function call - thus increasing indentation means recursion)

quickSort(arr, 100, 0)
   quickSort(arr, 49, 0)
      quickSort(arr, 24, 0)
         insertion sort
      quickSort(arr, 49, 26)
         insertion sort
   quickSort(arr, 100, 51)
      quickSort(arr, 76, 0)
         insertion sort
      quickSort(arr, 100, 74)
         insertion sort

The modified code does this:

quickSort(arr, 100, 0)
   quickSort(arr, 49, 0)
      quickSort(arr, 24, 0)
         break out of the while loop
         insertion sort
   lowE = 26
   break out of the while loop
      insertion sort
lowE = 51
run another iteration of the while-loop
    quickSort(arr, 76, 0)
      break out of the while loop
      insertion sort
lowE = 74
break out of the while loop
   insertion sort

Increase the stack size

Not sure whether you've considered this, or whether it would work with your parameters, but you can always consider simply increasing the stack size with the -Xss command-line parameter.

like image 200
Bernhard Barker Avatar answered Sep 30 '22 14:09

Bernhard Barker


Don Knuth in [ACP][1] suggests always pushing the larger of the two partitions and sorting the smaller one immediately, to limit stack growth. In your code that corresponds to recursively sorting the smaller of the two partitions first, then the other one.

[1]: The Art of Computer Programming, vol III, #5.2.2 p.114.

like image 37
user207421 Avatar answered Sep 30 '22 13:09

user207421