What do compilers do with compile-time branching?

Tags:

EDIT: I took the "if/else" case as an example that can sometimes be resolved at compile time (eg when static values are involved, cf <type_traits>). Adapting the answers below to other types of static branching (eg, multiple branches or multi-criteria branches) should be straightforward. Note that compile-time branching using template-meta programming is not the topic here.

In a typical code like this

#include <type_traits>  template <class T> T numeric_procedure( const T& x ) {     if ( std::is_integral<T>::value )     {         // Integral types     }     else     {         // Floating point numeric types     } }

will the compiler optimize the if/else statement out when I define specific template types later on in my code?

A simple alternative would be to write something like this:

#include <type_traits>  template <class T> inline T numeric_procedure( const T& x ) {     return numeric_procedure_impl( x, std::is_integral<T>() ); }  // ------------------------------------------------------------------------  template <class T> T numeric_procedure_impl( const T& x, std::true_type const ) {     // Integral types }  template <class T> T numeric_procedure_impl( const T& x, std::false_type const ) {     // Floating point numeric types }

Is there a difference in terms of performance between these solutions? Is there any non-subjective grounds for saying that one is better than the other? Are there other (possibly better) solutions to deal with compile-time branching?

243

asked May 31 '14 13:05

Jonathan H

2 Answers

TL;DR

There are several ways to get different run-time behavior dependent on a template parameter. Performance should not be your primary concern here, but flexibility and maintainability should. In all cases, the various thin wrappers and constant conditional expressions will all be optimized away on any decent compiler for release builds. Below a small summary with the various tradeoffs (inspired by this answer by @AndyProwl).

Run-time if

Your first solution is the simple run-time if:

template<class T> T numeric_procedure(const T& x) {     if (std::is_integral<T>::value) {         // valid code for integral types     } else {         // valid code for non-integral types,         // must ALSO compile for integral types     } }

It is simple and effective: any decent compiler will optimize away the dead branch.

There are several disadvantages:

on some platforms (MSVC), a constant conditional expression yields a spurious compiler warning which you then need to ignore or silence.
But worse, on all conforming platforms, both branches of the if/else statement need to actually compile for all types T, even if one of the branches is known not to be taken. If T contains different member types depending on its nature, then you will get a compiler error as soon as you try to access them.

Tag dispatching

Your second approach is known as tag-dispatching:

template<class T> T numeric_procedure_impl(const T& x, std::false_type) {     // valid code for non-integral types,     // CAN contain code that is invalid for integral types }      template<class T> T numeric_procedure_impl(const T& x, std::true_type) {     // valid code for integral types }  template<class T> T numeric_procedure(const T& x) {     return numeric_procedure_impl(x, std::is_integral<T>()); }

It works fine, without run-time overhead: the temporary std::is_integral<T>() and the call to the one-line helper function will both be optimized way on any decent platform.

The main (minor IMO) disadvantage is that you have some boilerplate with 3 instead of 1 function.

SFINAE

Closely related to tag-dispatching is SFINAE (Substitution failure is not an error)

template<class T, class = typename std::enable_if<!std::is_integral<T>::value>::type> T numeric_procedure(const T& x) {     // valid code for non-integral types,     // CAN contain code that is invalid for integral types }      template<class T, class = typename std::enable_if<std::is_integral<T>::value>::type> T numeric_procedure(const T& x) {     // valid code for integral types }

This has the same effect as tag-dispatching but works slightly differently. Instead of using argument-deduction to select the proper helper overload, it directly manipulates the overload set for your main function.

The disadvantage is that it can be a fragile and tricky way if you don't know exactly what the entire overload set is (e.g. with template heavy code, ADL could pull in more overloads from associated namespaces you didn't think of). And compared to tag-dispatching, selection based on anything other than a binary decision is a lot more involved.

Partial specialization

Another approach is to use a class template helper with a function application operator and partially specialize it

template<class T, bool>  struct numeric_functor;  template<class T> struct numeric_functor<T, false> {     T operator()(T const& x) const     {         // valid code for non-integral types,         // CAN contain code that is invalid for integral types     } };  template<class T> struct numeric_functor<T, true> {     T operator()(T const& x) const     {         // valid code for integral types     } };  template<class T> T numeric_procedure(T const& x) {     return numeric_functor<T, std::is_integral<T>::value>()(x); }

This is probably the most flexible approach if you want to have fine-grained control and minimal code duplication (e.g. if you also want to specialize on size and/or alignment, but say only for floating point types). The pattern matching given by partial template specialization is ideally suited for such advanced problems. As with tag-dispatching, the helper functors are optimized away by any decent compiler.

The main disadvantage is the slightly larger boiler-plate if you only want to specialize on a single binary condition.

If constexpr (C++1z proposal)

This is a reboot of failed earlier proposals for static if (which is used in the D programming language)

template<class T> T numeric_procedure(const T& x) {     if constexpr (std::is_integral<T>::value) {         // valid code for integral types     } else {         // valid code for non-integral types,         // CAN contain code that is invalid for integral types     } }

As with your run-time if, everything is in one place, but the main advantage here is that the else branch will be dropped entirely by the compiler when it is known not to be taken. A great advantage is that you keep all code local, and do not have to use little helper functions as in tag dispatching or partial template specialization.

Concepts-Lite (C++1z proposal)

Concepts-Lite is an upcoming Technical Specification that is scheduled to be part of the next major C++ release (C++1z, with z==7 as the best guess).

template<Non_integral T> T numeric_procedure(const T& x) {     // valid code for non-integral types,     // CAN contain code that is invalid for integral types }      template<Integral T> T numeric_procedure(const T& x) {     // valid code for integral types }

This approach replaces the class or typename keyword inside the template< > brackets with a concept name describing the family of types that the code is supposed to work for. It can be seen as a generalization of the tag-dispatching and SFINAE techniques. Some compilers (gcc, Clang) have experimental support for this feature. The Lite adjective is referring to the failed Concepts C++11 proposal.

answered Oct 02 '22 07:10

TemplateRex

Note that although the optimizer may well be able to prune statically-known tests and unreachable branches from the generated code, the compiler still needs to be able to compile each branch.

That is:

int foo() {   #if 0     return std::cout << "this isn't going to work\n";   #else     return 1;   #endif }

will work fine, because the preprocessor strips out the dead branch before the compiler sees it, but:

int foo() {   if (std::is_integral<double>::value) {     return std::cout << "this isn't going to work\n";   } else {     return 1;   } }

won't. Even though the optimizer can discard the first branch, it will still fail to compile. This is where using enable_if and SFINAE help, because you can select the valid (compilable) code, and the invalid (un-compilable) code's Failure to compile Is Not An Error.

answered Oct 02 '22 07:10

Useless

Related questions
                            
                                Shared pointer to an immutable type has value semantics
                            
                                Why don't the std::fstream classes take a std::string?
                            
                                library is linked but reference is undefined
                            
                                SEH exception with code 0xc0000005 thrown in the test body
                            
                                Rvalue Reference is Treated as an Lvalue?
                            
                                accessing a protected member of a base class in another subclass
                            
                                C++ Exceptions - Is throwing c-string as an exception bad?
                            
                                Serial Port (RS -232) Connection in C++
                            
                                Differences of using "const cv::Mat &", "cv::Mat &", "cv::Mat" or "const cv::Mat" as function parameters?
                            
                                Faster than binary search for ordered list
                            
                                How *(&arr + 1) - arr is working to give the array size [duplicate]
                            
                                How to correctly and standardly compare floats?
                            
                                What is the cost of a function call?
                            
                                When a compiler can infer a template parameter?
                            
                                c++ does implicit copy constructor copy array member variable? [duplicate]
                            
                                Is there a #define for 64 bit in Visual Studio 2010?
                            
                                Why are the terms "automatic" and "dynamic" preferred over the terms "stack" and "heap" in C++ memory management?
                            
                                Cannot get OpenCV to compile because of undefined references?
                            
                                Use std::lock_guard with try_lock
                            
                                Is std::lock() ill-defined, unimplementable, or useless?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With