Why don't C++ compilers do better constant folding?

Tags:

I'm investigating ways to speed up a large section of C++ code, which has automatic derivatives for computing jacobians. This involves doing some amount of work in the actual residuals, but the majority of the work (based on profiled execution time) is in calculating the jacobians.

This surprised me, since most of the jacobians are propagated forward from 0s and 1s, so the amount of work should be 2-4x the function, not 10-12x. In order to model what a large amount of the jacobian work is like, I made a super minimal example with just a dot product (instead of sin, cos, sqrt and more that would be in a real situation) that the compiler should be able to optimize to a single return value:

#include <Eigen/Core> #include <Eigen/Geometry>  using Array12d = Eigen::Matrix<double,12,1>;  double testReturnFirstDot(const Array12d& b) {     Array12d a;     a.array() = 0.;     a(0) = 1.;     return a.dot(b); }

Which should be the same as

double testReturnFirst(const Array12d& b) {     return b(0); }

I was disappointed to find that, without fast-math enabled, neither GCC 8.2, Clang 6 or MSVC 19 were able to make any optimizations at all over the naive dot-product with a matrix full of 0s. Even with fast-math (https://godbolt.org/z/GvPXFy) the optimizations are very poor in GCC and Clang (still involve multiplications and additions), and MSVC doesn't do any optimizations at all.

I don't have a background in compilers, but is there a reason for this? I'm fairly sure that in a large proportion of scientific computations being able to do better constant propagation/folding would make more optimizations apparent, even if the constant-fold itself didn't result in a speedup.

While I'm interested in explanations for why this isn't done on the compiler side, I'm also interested for what I can do on a practical side to make my own code faster when facing these kinds of patterns.

992

asked Aug 31 '18 10:08

jkflying

1 Answers

This is because Eigen explicitly vectorize your code as 3 vmulpd, 2 vaddpd and 1 horizontal reduction within the remaining 4 component registers (this assumes AVX, with SSE only you'll get 6 mulpd and 5 addpd). With -ffast-math GCC and clang are allowed to remove the last 2 vmulpd and vaddpd (and this is what they do) but they cannot really replace the remaining vmulpd and horizontal reduction that have been explicitly generated by Eigen.

So what if you disable Eigen's explicit vectorization by defining EIGEN_DONT_VECTORIZE? Then you get what you expected (https://godbolt.org/z/UQsoeH) but other pieces of code might become much slower.

If you want to locally disable explicit vectorization and are not afraid of messing with Eigen's internal, you can introduce a DontVectorize option to Matrix and disable vectorization by specializing traits<> for this Matrix type:

static const int DontVectorize = 0x80000000;  namespace Eigen { namespace internal {  template<typename _Scalar, int _Rows, int _Cols, int _MaxRows, int _MaxCols> struct traits<Matrix<_Scalar, _Rows, _Cols, DontVectorize, _MaxRows, _MaxCols> > : traits<Matrix<_Scalar, _Rows, _Cols> > {   typedef traits<Matrix<_Scalar, _Rows, _Cols> > Base;   enum {     EvaluatorFlags = Base::EvaluatorFlags & ~PacketAccessBit   }; };  } }  using ArrayS12d = Eigen::Matrix<double,12,1,DontVectorize>;

Full example there: https://godbolt.org/z/bOEyzv

answered Oct 19 '22 03:10

ggael

Related questions
                            
                                How to compile and execute from memory directly?
                            
                                How to use the Qt's PIMPL idiom?
                            
                                How to call a C# library from Native C++ (using C++\CLI and IJW)
                            
                                How to raise warning if return value is disregarded?
                            
                                Using C++ Boost's Graph Library
                            
                                specify default value of std::function
                            
                                Platform-independent GUID generation in C++?
                            
                                How is std::string implemented?
                            
                                Copy an cv::Mat inside a ROI of another one
                            
                                Why is this an undefined behavior?
                            
                                Difference between func() and (*this).func() in C++
                            
                                Forgot do in do... while loop
                            
                                c++11 foreach syntax and custom iterator
                            
                                error: ‘unique_ptr’ is not a member of ‘std’
                            
                                std::shared_ptr initialization: make_shared<Foo>() vs shared_ptr<T>(new Foo) [duplicate]
                            
                                With explicitly deleted member functions in C++11, is it still worthwhile to inherit from a noncopyable base class?
                            
                                Visual Studio 2010 not autolinking static libraries from projects that are dependencies as it should be supposed to
                            
                                How is the three-way comparison operator different from subtraction?
                            
                                Error with multiple definitions of function
                            
                                C++ priority_queue with lambda comparator error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why don't C++ compilers do better constant folding?

Tags:

c++

compiler-construction

automatic-differentiation

eigen

ceres-solver

jkflying

People also ask

1 Answers

ggael

Recent Activity

Donate For Us