Which is faster — sorting or multiplying a small array of elements?

Tags:

Reading through Cactus Kev's Poker Hand Evaluator, I noticed the following statements:

At first, I thought that I could always simply sort the hand first before passing it to the evaluator; but sorting takes time, and I didn't want to waste any CPU cycles sorting hands. I needed a method that didn't care what order the five cards were given as.
...
After a lot of thought, I had a brainstorm to use prime numbers. I would assign a prime number value to each of the thirteen card ranks... The beauty of this system is that if you multiply the prime values of the rank of each card in your hand, you get a unique product, regardless of the order of the five cards.
...
Since multiplication is one of the fastest calculations a computer can make, we have shaved hundreds of milliseconds off our time had we been forced to sort each hand before evaluation.

I have a hard time believing this.

Cactus Kev represents each card as a 4-byte integer, and evaluates hands by calling eval_5cards( int c1, int c2, int c3, int c4, int c5 ). We could represent cards as one byte, and a poker hand as a 5-byte array. Sorting this 5-byte array to get a unique hand must be pretty fast. Is it faster than his approach?

What if we keep his representation (cards as 4-byte integers)? Can sorting an array of 5 integers be faster than multiplying them? If not, what sort of low-level optimizations can be done to make sorting a small number of elements faster?

Thanks!

Good answers everyone; I'm working on benchmarking the performance of sorting vs multiplication, to get some hard performance statistics.

955

asked Jun 28 '10 18:06

Rudiger

2 Answers

Of course it depends a lot on the CPU of your computer, but a typical Intel CPU (e.g. Core 2 Duo) can multiply two 32 Bit numbers within 3 CPU clock cycles. For a sort algorithm to beat that, the algorithm needs to be faster than 3 * 4 = 12 CPU cycles, which is a very tight constraint. None of the standard sorting algorithms can do it in less than 12 cycles for sure. Alone the comparison of two numbers will take one CPU cycle, the conditional branch on the result will also take one CPU cycle and whatever you do then will at least take one CPU cycle (swapping two cards will actually take at least 4 CPU cycles). So multiplying wins.

Of course this is not taking the latency into account to fetch the card value from either 1st or 2nd level cache or maybe even memory; however, this latency applies to either case, multiplying and sorting.

answered Sep 21 '22 10:09

Mecki

Without testing, I'm sympathetic to his argument. You can do it in 4 multiplications, as compared to sorting, which is n log n. Specifically, the optimal sorting network requires 9 comparisons. The evaluator then has to at least look at every element of the sorted array, which is another 5 operations.

answered Sep 17 '22 10:09

Matthew Flaschen

Related questions
                            
                                Globally override malloc in visual c++
                            
                                Decoding a JPEG Huffman block (table)
                            
                                Extract the fields of a C struct
                            
                                How to allow certain threads to have priority in locking a mutex use PTHREADS
                            
                                How to set up the Eclipse for remote C debugging with gdbserver?
                            
                                K&R style function definition problem
                            
                                Why are function declaration mandatory in C++ and not in C?
                            
                                Calling Cocoa APIs from C
                            
                                How to implement a good debug/logging feature in a project
                            
                                Is it safe to call pthread_cancel() on terminated thread?
                            
                                Turn on core/crash dumps programmatically
                            
                                How to set decode pixel format in libavcodec?
                            
                                Efficient implementation of a Bloom filter in C?
                            
                                Get current time in seconds in kernel module
                            
                                What is pointer swizzling?
                            
                                ODBC vs JDBC performance
                            
                                Disable variable-length automatic arrays in gcc
                            
                                Function on a variable declaration
                            
                                function pointer :physical or virtual address
                            
                                cURL - put output into variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which is faster — sorting or multiplying a small array of elements?

Tags:

c

algorithm

optimization

sorting

poker

Rudiger

People also ask

2 Answers

Mecki

Matthew Flaschen

Recent Activity

Donate For Us