Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shell Vs. Hibbard time complexity comparison

I’m doing an experiment to compare how Thomas Hibbard’s shell sort (gap size = 2^k-1) and Donald Shell’s shell sort (n/2^k) perform on the same array. When the size of the array is from 10 to 1000, Hibbard is performing better than shell. But when the size reaches 10000 or higher, shell sort is faster than Hibbard.

According to big O notation, Hibbard is O(N^1.5), Shell is O(N^2), which makes me think that Hibbard should have greater improvements over Shell as the size of the data set increases. Can anybody tell me why my results might not be as expected?

I understand that O notation is worst case complexity, but it seems that the performance should align better to the notation.

Here is my code written in JAVA: (note: unsortedArray is declared and initialized earlier)

{
    int temp;
    int[] sortedArray = unsortedArray.clone();
    printArray();
    int k = (int)(Math.log(sortedArray.length)/Math.log(2));
    int gap = (int)(Math.pow(2,k)-1);
    int count = 0;
    long endTime;
    long startTime = System.nanoTime();

    while (gap > 0)
    {
        for (int g = 0; g < gap; g++)
        {
            for (int d = g + gap; d < sortedArray.length; d = d + gap)
            {
                for (int i = d; i - gap >= 0; i = i - gap)
                {                          
                    if (sortedArray[i - gap] <= (sortedArray[i]) )
                    {
                        break;
                    }
                    count++;
                    temp = sortedArray[i];
                    sortedArray[i] = sortedArray [i-gap];
                    sortedArray[i-gap] = temp;

                }
            }
        }

        k = k -1;
        gap = (int)(Math.pow(2,k)-1);
    }
    endTime = System.nanoTime();
    System.out.println("The total time for hibbard sort is" + (endTime-startTime));
    System.out.println("the number of swaps for hibbard sort is" + count);
}
like image 284
Julia Bian Avatar asked Jun 09 '15 18:06

Julia Bian


1 Answers

Measuring the difference in time complexity between these two gap generation algorithms is actually a bit trickier than this.

Consider supplying both with a sorted sequence of data and assume n=8. For Shell we get a gap sequence of {4,2,1} and for Hibbard {7,3,1}. To run through the sorted sequence would require (Shell) (n-4)+(n-2)+(n-1) or 17 comparisons and (Hibbard) (n-7)+(n-3)+(n-1) or 13.

It is obvious that you can't draw the conclusion that Hibbard will execute in 13/17 of the time of Shell given a random sequence of n elements.

It may be found that a given, randomly generated sequence turns out to be better suited to be sorted using Shell than Hibbard, or vice versa. The only way to determine which is better is to test all possible data sequence combinations and calculate the average number of comparisons that they require. This is done easily enough when n=8 (only n! or 40320 combinations) but ... "more" diffcult when n=100 or 1000.

A Shellsort on all possible sequences when n=8 using the two gaps above (insertion sort and best shellsort gap included for comparison):

                       number of comparisons when n=8
gap sequence           minimum   average   maximum   reverse
{1} (insertion sort)      7       19.28      28        28
{5,1} (best gap)         10       17.4       24        15 
{4,2,1} (Shell)          17       21.82      30        22
{7,3,1} (Hibbard)        13       18.57      24        20

So in the case of n=8 Hibbard is much better than Shell, better than insertion sort but worse than the best gap sequence. Interestingly, for the best gap, sorting a reversed sequence of data is faster than the average case!

If we look at n=14 we find that Shell and Hibbard result in the same sequence, {7,3,1} - where obviously neither algorithm will be better than the other - and at n=16 we get {8,4,2,1} and {15,7,3,1}, respectively. This results in a much better best case for Hibbard (n*4)-(15+7+3+1) or 38 comparisons than for Shell (n*4)-(8+4+2+1) or 49.

Which will be better than the other as n increases? It appears to me that the best gap sequence is dependent on n which should give an edge to Shell as Hibbard, for example, has the same sequence {7,3,1} when 8 <= n < 16 and {15,7,3,1} when 16 <= n <32: basically "one size fits all".

Shellsort wants to move values that are far apart for the initial gap(s) and while that may be true for Hibbard when n=16, 17 or 18 and the gap is 15 it is much less true as n approaches the upper limit of 31.

However, that is not to say that Shell results in better gap sequences. The first gap of n/2 will always be hampered by the same problems as Hibbard's initial gap as n approaches its upper limit for the sequence.

So my guess is that Shell will give stable results that are worse than Hibbard and probably also insertion sort. Hibbard's results from the lower to the upper n limits for the initial gap will increase disproportionately. Somewhere, as n approaches the upper n limit, also Hibbard will begin to perform worse than insertion sort.

In addition to calculating the values of the gaps themselves the number of gaps must also be determined. As shown previously, two gaps suffice when n=8 but will that also be true when n=10 or more? For example, from 2 <= n < 6 an insertion sort will be faster than any shellsort, but from n=6 two gaps allows Shellsort to beat insertion sort.

like image 112
Olof Forshell Avatar answered Oct 23 '22 09:10

Olof Forshell