How can the C++ Eigen library perform better than specialized vendor libraries?

Tags:

I was looking over the performance benchmarks: http://eigen.tuxfamily.org/index.php?title=Benchmark

I could not help but notice that eigen appears to consistently outperform all the specialized vendor libraries. The questions is: how is it possible? One would assume that mkl/goto would use processor specific tuned code, while eigen is rather generic.

Notice this http://download.tuxfamily.org/eigen/btl-results-110323/aat.pdf, essentially a dgemm. For N=1000 Eigen gets roughly 17Gf, MKL only 12Gf

226

asked Apr 28 '12 17:04

Anycorn

2 Answers

Eigen has lazy evaluation. From How does Eigen compare to BLAS/LAPACK?:

For operations involving complex expressions, Eigen is inherently faster than any BLAS implementation because it can handle and optimize a whole operation globally -- while BLAS forces the programmer to split complex operations into small steps that match the BLAS fixed-function API, which incurs inefficiency due to introduction of temporaries. See for instance the benchmark result of a Y = aX + bY operation which involves two calls to BLAS level1 routines while Eigen automatically generates a single vectorized loop.

The second chart in the benchmarks is Y = a*X + b*Y, which Eigen was specially designed to handle. It should be no wonder that a library wins at a benchmark it was created for. You'll notice that the more generic benchmarks, like matrix-matrix multiplication, don't show any advantage for Eigen.

answered Sep 20 '22 23:09

chrisaycock

Benchmarks are designed to be misinterpreted.

Let's look at the matrix * matrix product. The benchmark available on this page from the Eigen website tells you than Eigen (with its own BLAS) gives timings similar to the MKL for large matrices (n = 1000). I've compared Eigen 3.2.6 with MKL 11.3 on my computer (a laptop with a core i7) and the MKL is 3 times faster than Eigen for such matrices using one thread, and 10 times faster than Eigen using 4 threads. This looks like a completely different conclusion. There are two reasons for this. Eigen 3.2.6 (its internal BLAS) does not use AVX. Moreover, it does not seem to make a good usage of multithreading. This benchmark hides this as they use a CPU that does not have AVX support without multithreading.

Usually, those C++ libraries (Eigen, Armadillo, Blaze) bring two things:

Nice operator overloading: You can use +, * with vectors and matrices. In order to get nice performance, they have to use tricky techniques known as "Smart Template expression" in order to avoid temporary when they reduce the timing (such as y = alpha x1 + beta x2 with y, x1, x2 vectors) and introduce them when they are useful (such as A = B * C with A, B, C matrices). They can also reorder operations for less computations, for instance, if A, B, C are matrices A * B * C can be computed as (A * B) * C or A * (B * C) depending upon their sizes.
Internal BLAS: To compute the product of 2 matrices, they can either rely on their internal BLAS or one externally provided (MKL, OpenBLAS, ATLAS). On Intel chips with large matrices, the MKL il almost impossible to beat. For small matrices, one can beat the MKL as it was not designed for that kind of problems.

Usually, when those libraries provide benchmarks against the MKL, they usually use old hardware, and do not turn on multithreading so they can be on par with the MKL. They might also compare BLAS level 1 operations such as y = alpha x1 + beta x2 with 2 calls to a BLAS level 1 function which is a stupid thing to do anyway.

In a nutshell, those libraries are extremely convenient for their overloading of + and * which is extremely difficult to do without losing performance. They usually do a good job on this. But when they give you benchmark saying that they can be on par or beat the MKL with their own BLAS, be careful and do your own benchmark. You'll usually get different results ;-).

answered Sep 19 '22 23:09

InsideLoop

Related questions
                            
                                Qt Designer C++ or QML for GUI [closed]
                            
                                track C++ memory allocations
                            
                                Global variables joke [closed]
                            
                                Reducing code duplication while defining a commutative operation
                            
                                How to link using GCC without -l nor hardcoding path for a library that does not follow the libNAME.so naming convention?
                            
                                Delayed start of a thread in C++ 11
                            
                                Conversion function for error checking considered good?
                            
                                Is a moved-from vector always empty?
                            
                                Vector arguments in Boost Program Options
                            
                                How true is "Want Speed? Pass by value"
                            
                                Difference between InvalidateRect and RedrawWindow
                            
                                How to use a member variable as a default argument in C++?
                            
                                Why are C++ STL iostreams not "exception friendly"?
                            
                                Unknown type name 'class'; did you mean 'Class'?
                            
                                STL vectors with uninitialized storage?
                            
                                call to pure virtual function from base class constructor
                            
                                Repeated typedefs - invalid in C but valid in C++?
                            
                                Your preferred C/C++ header policy for big projects? [closed]
                            
                                Does vector::erase() on a vector of object pointers destroy the object itself?
                            
                                Tricky interview question for mid-level C++ developer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can the C++ Eigen library perform better than specialized vendor libraries?

Tags:

c++

performance

eigen

Anycorn

People also ask

2 Answers

chrisaycock

InsideLoop

Recent Activity

Donate For Us