Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shellsort, 2.48^(k-1) vs Tokuda's sequence

Introduction

Shellsort is an interesting sort algorithm that I came across a while ago. The most amazing part is that a different gap sequences can significantly improve the speed of the algorithm. I did a few reading (not extensively) and it seem that Tokuda's sequence is recommendeded for practical applications.

Another interesting part is that the sequence of ratio 2.20~2.25 tends to be more efficient. So I did a small search thought the sequence of ratio from 2.20 to 2.50 and tried to search for which ratio that can performance averagely good. I come across this ratio: 2.48 that seem to averagely performance good in many different trials.

Then, I came up with sequence generator: 2.48k-1 (lets call it 248 sequence) and tried to compare it with Tokuda's sequence. It turned out that they averagely equal in speed. 248 sequences tends to use slightly more number of comparison.

Benchmark Methods

  • Instead of using millisecond as a measurement, I use the number of comparision and number of swap.
  • I did 100 trials each on the following array size (50,000; 100,000; 200,000; 300,000; 500,000; 1,000,000) and keep track of their number of comparison and number of swap.
  • The following is my data (here in CSV format).
  • Full Code: http://pastebin.com/pm7Akpqh

Questions

I know I might be wrong that is why I come here to seek for more opinion from more experienced programmers here. In case you don't get the question, here is my question in short:

  • Does 2.48k-1 is as good as Tokuda's sequence?
  • If it is as good as Tokuda's sequence, would it be more practical to use it since 2.48k-1 sequence is easier to generate than Tokuda's sequence.

248 Sequence:
  ROUNDUP ( 2.48(k-1) )
  eg: 1, 3, 7, 16, 38, 94, 233, 577, 1431, 3549, 8801, 21826, ...

Tokuda's Sequence
  ROUNDUP ( (9k - 4k) / (5 * 4k - 1) )
  eg: 1, 4, 9, 20, 46, 103, 233, 525, 1182, 2660, 5985, 13467, ...

As @woolstar suggested me to also test with edge cases such as reversed and sorted. As expectedly, 248 sequence is faster in edge cases because 248 sequence gap is larger so it moves the inverse faster.


Shellsort Implementation

public static int compare = 0;
public static int swap = 0;

public static bool greaterthan(int a, int b) {
    compare++;
    return a > b;
}

public static int shellsort(int[] a, int[] gaps) {
    // For keeping track of number of swap and comparison
    compare = 0;
    swap = 0;

    int temp, gap, i, j;

    // Finding a gap that is smaller than the length of the array
    int gap_index = gaps.Length - 1;
    while (gaps[gap_index] > a.Length) gap_index--;

    while (gap_index >= 0) {

        // h-sorting
        gap = gaps[gap_index];
        for (i = gap; i < a.Length; i++) {
            temp = a[i];
            for(j = i; (j >= gap) && (greaterthan(a[j - gap], temp)); j -= gap) {
                a[j] = a[j - gap];
            }

            // swapping
            a[j] = temp;
            swap++;
        }

        gap_index--;
    }

    return compare;
}
like image 767
invisal Avatar asked Feb 02 '14 08:02

invisal


People also ask

Is Shell sort in-place?

Shellsort (also known as Shell sort or Shell's method) is an in-place comparison based sorting algorithm. Shell Sort improves its time complexity by taking the advantage of the fact that using Insertion Sort on a partially sorted array results in less number of moves.

How does Shell sort work?

Shell sort is a generalized version of the insertion sort algorithm. It first sorts elements that are far apart from each other and successively reduces the interval between the elements to be sorted. The interval between the elements is reduced based on the sequence used.

What is GAP sequence in Shell sort?

The "gap sequence" determines the initial size of the gap and how it is made smaller with each iteration. A good gap sequence is important for shell sort to perform well.


1 Answers

According to this reserach: (Ciura, Marcin (2001) "Best Increments for the Average Case of Shellsort". In Freiwalds, Rusins. Proceedings of the 13th International Symposium on Fundamentals of Computation Theory. London: Springer-Verlag. pp. 106–117) the dominant operation in shell sort for arrays with size with less than 108 elements should be the comparison operation, not swap:

Knuth’s discussion assumes that the running time can be approximated as 9×number of moves, while Figures 3 and 4 show that for each sequence the number of key comparisons is a better measure of the running time than the number of moves. The asymptotic ratio of 9 cycles per move is not too precise for N ≤ 108, and, if some hypothetical sequence makes Θ(NlogN) moves, it is never attained. Analogous plots for other computer architectures would lead to the same conclusion.

Treating moves as the dominant operation leads to mistaken conclusions about the optimal sequences.

In this context the answer to your question is no: the 248 sequence is worse, because it uses more comparisons. You could consider also comparing your sequence with the Ciura's sequence presented in this article, as this research seems to prove that it's better than Tokuda's sequence.

like image 111
BartoszKP Avatar answered Oct 18 '22 01:10

BartoszKP