In reference to fastest sort of fixed length 6 int array, I do not fully understand how this sorting network beats an algorithm like insertion sort. Form that question, here is a comparison of the number of CPU cycles taken to complete the sort : <blockquote> Linux 32 bits, gcc 4.4.1, Intel Core 2 Quad Q8300, -O2 <ul> <li>Insertion Sort (Daniel Stutzbach) : 1425</li> <li>Sorting Networks (Daniel Stutzbach) : 1080</li> </ul> </blockquote> The code used is as follows : <blockquote> Insertion Sort (Daniel Stutzbach) </blockquote> <pre class="prettyprint"><code>static inline void sort6_insertion_sort_v2(int *d){ int i, j; for (i = 1; i < 6; i++) { int tmp = d[i]; for (j = i; j >= 1 && tmp < d[j-1]; j--) d[j] = d[j-1]; d[j] = tmp; } } </code></pre> <blockquote> Sorting Networks (Daniel Stutzbach) </blockquote> <pre class="prettyprint"><code>static inline void sort6_sorting_network_v1(int * d){ #define SWAP(x,y) if (d[y] < d[x]) { int tmp = d[x]; d[x] = d[y]; d[y] = tmp; } SWAP(1, 2); SWAP(0, 2); SWAP(0, 1); SWAP(4, 5); SWAP(3, 5); SWAP(3, 4); SWAP(0, 3); SWAP(1, 4); SWAP(2, 5); SWAP(2, 4); SWAP(1, 3); SWAP(2, 3); #undef SWAP } </code></pre> I understand that sorting networks are really good for sorting in parallel, because some of the steps are independent of the other steps. But here we are not using the parallelization. I expect it to be faster, as it has the advantage of knowing the exact number of elements beforehand. Where and why exactly does insertion sort make unnecessary comparisons? EDIT1: This is the input set these codes are compared against: <pre class="prettyprint"><code>int d[6][6] = {\ {1, 2, 3, 4, 5, 6},\ {6, 5, 4, 3, 2, 1},\ {100, 2, 300, 4, 500, 6},\ {100, 2, 3, 4, 500, 6},\ {1, 200, 3, 4, 5, 600},\ {1, 1, 2, 1, 2, 1}\ };\ </code></pre>

<blockquote> But here we are not using the parallelization. </blockquote> Modern CPUs can figure out when instructions are independent and will execute them in parallel. Hence, even though there's only one thread, the sorting network's parallelism can be exploited. <blockquote> Where exactly does insertion sort make unnecessary comparisons? </blockquote> The easiest way to see the extra comparisons is to do an example by hand. <pre class="prettyprint"><code>Insertion sort: 6 5 4 3 2 1 5 6 4 3 2 1 5 4 6 3 2 1 4 5 6 3 2 1 4 5 3 6 2 1 4 3 5 6 2 1 3 4 5 6 2 1 3 4 5 2 6 1 3 4 2 5 6 1 3 2 4 5 6 1 2 3 4 5 6 1 2 3 4 5 1 6 2 3 4 1 5 6 2 3 1 4 5 6 2 1 3 4 5 6 1 2 3 4 5 6 Sorting network: 6 5 4 3 2 1 6 4 5 3 2 1 5 4 6 3 2 1 4 5 6 3 2 1 # These three can execute in parallel with the first three 4 5 6 3 1 2 # 4 5 6 2 1 3 # 4 5 6 1 2 3 1 5 6 4 2 3 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6 </code></pre>

How does a sorting network beat generic sorting algorithms?

Tags:

c

algorithm

comparison

sorting

sorting-network

In reference to fastest sort of fixed length 6 int array, I do not fully understand how this sorting network beats an algorithm like insertion sort.

Form that question, here is a comparison of the number of CPU cycles taken to complete the sort :

Linux 32 bits, gcc 4.4.1, Intel Core 2 Quad Q8300, -O2

Insertion Sort (Daniel Stutzbach) : 1425

Sorting Networks (Daniel Stutzbach) : 1080

The code used is as follows :

Insertion Sort (Daniel Stutzbach)

static inline void sort6_insertion_sort_v2(int *d){
    int i, j;
    for (i = 1; i < 6; i++) {
            int tmp = d[i];
            for (j = i; j >= 1 && tmp < d[j-1]; j--)
                    d[j] = d[j-1];
            d[j] = tmp;
    }
}

Sorting Networks (Daniel Stutzbach)

static inline void sort6_sorting_network_v1(int * d){
#define SWAP(x,y) if (d[y] < d[x]) { int tmp = d[x]; d[x] = d[y]; d[y] = tmp; }
    SWAP(1, 2);
    SWAP(0, 2);
    SWAP(0, 1);
    SWAP(4, 5);
    SWAP(3, 5);
    SWAP(3, 4);
    SWAP(0, 3);
    SWAP(1, 4);
    SWAP(2, 5);
    SWAP(2, 4);
    SWAP(1, 3);
    SWAP(2, 3);
#undef SWAP
}

I understand that sorting networks are really good for sorting in parallel, because some of the steps are independent of the other steps. But here we are not using the parallelization.

I expect it to be faster, as it has the advantage of knowing the exact number of elements beforehand. Where and why exactly does insertion sort make unnecessary comparisons?

EDIT1:

This is the input set these codes are compared against:

int d[6][6] = {\
    {1, 2, 3, 4, 5, 6},\
    {6, 5, 4, 3, 2, 1},\
    {100, 2, 300, 4, 500, 6},\
    {100, 2, 3, 4, 500, 6},\
    {1, 200, 3, 4, 5, 600},\
    {1, 1, 2, 1, 2, 1}\
};\

380

asked Oct 10 '10 16:10

Lazer

1 Answers

But here we are not using the parallelization.

Modern CPUs can figure out when instructions are independent and will execute them in parallel. Hence, even though there's only one thread, the sorting network's parallelism can be exploited.

Where exactly does insertion sort make unnecessary comparisons?

The easiest way to see the extra comparisons is to do an example by hand.

Insertion sort:
6 5 4 3 2 1
5 6 4 3 2 1
5 4 6 3 2 1
4 5 6 3 2 1
4 5 3 6 2 1
4 3 5 6 2 1
3 4 5 6 2 1
3 4 5 2 6 1
3 4 2 5 6 1
3 2 4 5 6 1
2 3 4 5 6 1
2 3 4 5 1 6
2 3 4 1 5 6
2 3 1 4 5 6
2 1 3 4 5 6
1 2 3 4 5 6

Sorting network:
6 5 4 3 2 1
6 4 5 3 2 1
5 4 6 3 2 1
4 5 6 3 2 1 # These three can execute in parallel with the first three
4 5 6 3 1 2 #
4 5 6 2 1 3 #
4 5 6 1 2 3
1 5 6 4 2 3
1 2 6 4 5 3
1 2 3 4 5 6
1 2 3 4 5 6

192

answered Oct 14 '22 03:10

Daniel Stutzbach

Related questions
                            
                                Adding Linker Flags in Xcode
                            
                                Bitwise transpose of 8 bytes
                            
                                Is there a .def file equivalent on Linux for controlling exported function names in a shared library?
                            
                                Is local memory slower than shared memory in CUDA?
                            
                                NSLog style debug messages from C code
                            
                                Is it Undefined Behaviour to cast away the constness of a function parameter?
                            
                                MinGW GCC: "Unknown conversion type character 'h'" (snprintf)
                            
                                C: Casting minimum 32-bit integer (-2147483648) to float gives positive number (2147483648.0)
                            
                                Reading and writing rsa keys to a pem file in C
                            
                                Cannot include both files (WinSock2, Windows.h)
                            
                                Print all permutation in lexicographic order
                            
                                x86 calling convention: should arguments passed by stack be read-only?
                            
                                c timeval vs timespec
                            
                                `bash: ./a.out: No such file or directory` on running executable produced by `ld`
                            
                                Memory Alignment in C/C++
                            
                                Function call with pointer to non-const and pointer to const arguments of same address
                            
                                What is the type of an enum whose values appear to be strings?
                            
                                Why is fseek or fflush always required between reading and writing in the update modes?
                            
                                Overcoming C limitations for large projects
                            
                                What could be the fastest and least painful way to learn LISP for a C developer?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With