I wrote two Matrix Multiplications programs in C++: Regular MM (source), and Strassen's MM (source), both of which operate on square matrices of sizes 2^k x 2^k(in other words, square matrices of even size). Results are just terrible. For 1024 x 1024 matrix, Regular MM takes <code>46.381 sec</code>, while Strassen's MM takes <code>1484.303 sec</code> (<code>25 minutes</code> !!!!). I attempted to keep the code as simple as possible. Other Strassen's MM examples found on the web are not that much different from my code. One issue with Strassen's code is obvious - I don't have cutoff point, that switches to regular MM. What other issues my Strassen's MM code has ??? Thanks ! Direct links to sources http://pastebin.com/HqHtFpq9 http://pastebin.com/USRQ5tuy Edit1. Fist, a lot of great advices. Thank you for taking your time and sharing knowledge. I implemented changes(kept all of my code), added cut-off point. MM of 2048x2048 matrix, with cutoff 512 already gives good results. Regular MM: 191.49s Strassen's MM: 112.179s Significant improvement. Results were obtained on prehistoric Lenovo X61 TabletPC with Intel Centrino processor, using Visual Studio 2012. I will do more checks(to make sure I got the correct results), and will publish the results.

<blockquote> One issue with Strassen's code is obvious - I don't have cutoff point, that switches to regular MM. </blockquote> It's fair to say that recursing down to 1 point is the bulk of (if not the entire) problem. Trying to guess at other performance bottlenecks without addressing this is almost moot due to the massive performance hit that it brings. (In other words, you're comparing Apples to Oranges.) As discussed in the comments, cache alignment could have an effect, but not to this scale. Furthemore, cache alignment would likely hurt the regular algorithm more than the Strassen algorithm since the latter is cache-oblivious. <pre class="prettyprint"><code>void strassen(int **a, int **b, int **c, int tam) { // trivial case: when the matrix is 1 X 1: if (tam == 1) { c[0][0] = a[0][0] * b[0][0]; return; } </code></pre> That's far too small. While the Strassen algorithm has a smaller complexity, it has a much bigger Big-O constant. For one, you have function call overhead all the way down to 1 element. This is analogous to using merge or quick sort and recursing all the way down to one element. To be efficient you need to stop the recursion when the size gets small and fall back to the classic algorithm. In quick/merge sort, you'd fall back to a low-overhead <code>O(n^2)</code> insertion or selection sort. Here you would fall back to the normal <code>O(n^3)</code> matrix multiply. <hr> The threshold which you fall back the classic algorithm should be a tunable threshold that will likely vary depending on the hardware and the ability of the compiler to optimize the code. For something like Strassen multiplication where the advantage is only <code>O(2.8074)</code> over the classic <code>O(n^3)</code>, don't be surprised if this threshold turns out to be very high. (thousands of elements?) <hr> In some applications there can be many algorithms each with decreasing complexity but increasing Big-O. The result is that multiple algorithms become optimal at different sizes. Large integer multiplication is a notorious example of this: <ul> <li>Grade-school Multiplication: O(N^2) optimal for < ~100 digits*</li> <li> Karatsuba Multiplication: O(N^1.585) faster than above at ~100 digits*</li> <li> Toom-Cook 3-way: O(N^1.465) faster than Karatsuba at ~3000 digits*</li> <li> Floating-point FFT: O(> N log(N)) faster than Karatsuba/Toom-3 at ~700 digits*</li> <li> Schönhage–Strassen algorithm (SSA): O(N log(n) loglog(n)) faster than FFT at ~ a billion digits*</li> <li> Fixed-width Number-Theoretic Transform: O(N log(n) faster than SSA at ~ a few billion digits?*</li> </ul> *Note these example thresholds are approximate and can vary drastically - often by more than a factor of 10.

Why is my Strassen's Matrix Multiplication slow?

Tags:

c++

performance

optimization

matrix-multiplication

strassen

I wrote two Matrix Multiplications programs in C++: Regular MM (source), and Strassen's MM (source), both of which operate on square matrices of sizes 2^k x 2^k(in other words, square matrices of even size).

Results are just terrible. For 1024 x 1024 matrix, Regular MM takes 46.381 sec, while Strassen's MM takes 1484.303 sec (25 minutes !!!!).

I attempted to keep the code as simple as possible. Other Strassen's MM examples found on the web are not that much different from my code. One issue with Strassen's code is obvious - I don't have cutoff point, that switches to regular MM.

What other issues my Strassen's MM code has ???

Thanks !

Direct links to sources
http://pastebin.com/HqHtFpq9
http://pastebin.com/USRQ5tuy

Edit1. Fist, a lot of great advices. Thank you for taking your time and sharing knowledge.

I implemented changes(kept all of my code), added cut-off point. MM of 2048x2048 matrix, with cutoff 512 already gives good results. Regular MM: 191.49s Strassen's MM: 112.179s Significant improvement. Results were obtained on prehistoric Lenovo X61 TabletPC with Intel Centrino processor, using Visual Studio 2012. I will do more checks(to make sure I got the correct results), and will publish the results.

380

asked Nov 26 '12 06:11

newprint

1 Answers

One issue with Strassen's code is obvious - I don't have cutoff point, that switches to regular MM.

It's fair to say that recursing down to 1 point is the bulk of (if not the entire) problem. Trying to guess at other performance bottlenecks without addressing this is almost moot due to the massive performance hit that it brings. (In other words, you're comparing Apples to Oranges.)

As discussed in the comments, cache alignment could have an effect, but not to this scale. Furthemore, cache alignment would likely hurt the regular algorithm more than the Strassen algorithm since the latter is cache-oblivious.

void strassen(int **a, int **b, int **c, int tam) {

    // trivial case: when the matrix is 1 X 1:
    if (tam == 1) {
            c[0][0] = a[0][0] * b[0][0];
            return;
    }

That's far too small. While the Strassen algorithm has a smaller complexity, it has a much bigger Big-O constant. For one, you have function call overhead all the way down to 1 element.

This is analogous to using merge or quick sort and recursing all the way down to one element. To be efficient you need to stop the recursion when the size gets small and fall back to the classic algorithm.

In quick/merge sort, you'd fall back to a low-overhead O(n^2) insertion or selection sort. Here you would fall back to the normal O(n^3) matrix multiply.

The threshold which you fall back the classic algorithm should be a tunable threshold that will likely vary depending on the hardware and the ability of the compiler to optimize the code.

For something like Strassen multiplication where the advantage is only O(2.8074) over the classic O(n^3), don't be surprised if this threshold turns out to be very high. (thousands of elements?)

In some applications there can be many algorithms each with decreasing complexity but increasing Big-O. The result is that multiple algorithms become optimal at different sizes.

Large integer multiplication is a notorious example of this:

Grade-school Multiplication: O(N^2) optimal for < ~100 digits*
Karatsuba Multiplication: O(N^1.585) faster than above at ~100 digits*
Toom-Cook 3-way: O(N^1.465) faster than Karatsuba at ~3000 digits*
Floating-point FFT: O(> N log(N)) faster than Karatsuba/Toom-3 at ~700 digits*
Schönhage–Strassen algorithm (SSA): O(N log(n) loglog(n)) faster than FFT at ~ a billion digits*
Fixed-width Number-Theoretic Transform: O(N log(n) faster than SSA at ~ a few billion digits?*

_{*Note these example thresholds are approximate and can vary drastically - often by more than a factor of 10.}

answered Oct 18 '22 21:10

Mysticial

Related questions
                            
                                How to avoid successive deallocations/allocations in C++?
                            
                                C++ performance, for versus while
                            
                                How to spot undefined behavior
                            
                                Can you explain what's symbols and debug symbols in c++ world?
                            
                                remove not working correctly
                            
                                Implicit Template Parameters
                            
                                How can I set the baud rate to 307,200 on Linux?
                            
                                Assigning a reference by dereferencing a NULL pointer
                            
                                Boost lib appears to be missing hpp files?
                            
                                How to get file's owner name in Linux using C++?
                            
                                find a cost optimized path through a grid/matrix in c++
                            
                                virtual function declared non-virtual in a derived class
                            
                                C++ - deprecated conversion from string constant to ‘char*’ [duplicate]
                            
                                How do I pad a QString with spaces?
                            
                                Passing by pointer
                            
                                What does this actually do? - Crazy C++ function
                            
                                class friend function inside a namespace
                            
                                Install boost devel on archlinux
                            
                                Getting stacktrace of all threads without attaching GDB
                            
                                OpenCV sub-image from a Mat image [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With