Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intel C++ Compiler is extremely slow to compile recursive decltype returns

I'm writing a template for expressions parametrised by an arbitrary number of char labels.

Given an argument list, a factory function returns an expression of different types depending on whether there are two arguments of the same types or whether they are unique.

A concrete example: suppose that A is a "labelable" object with its operator() overloaded to produce an ?Expression<...>. Let a, b, ... be declared as labels LabelName<'a'>, LabelName<'b'>, .... Then A(a,b,c,d) would produce a UniqueExpression<'a','b','c','d'>, whereas A(a,c,b,c) would produce a RepeatedExpression<'a','c','b','c'> instead.

To achieve this, I had to define the ?Expression's factory function with auto and decltype. Moreover, the decltype has to cascade to another decltype until the metaprogram finishes recursing through the arguments and the return type is finally decided. As an illustration, I have isolated a fairly minimal code for the factory method.

template <typename... T> struct TypeList { };
template <char C> struct LabelName { };

template <typename... T> class UniqueExpression
{
    // Contains implementation details in actual code
};

template <typename... T> class RepeatedExpression
{
    // Contains implementation details in actual code
};

class ExpressionFactory {
private:
    template <char _C, typename... T, typename... _T>
    static UniqueExpression<T...>
    _do_build(TypeList<T...>,
              TypeList<LabelName<_C>>,
              TypeList<>,
              TypeList<_T...>)
    {
        return UniqueExpression<T...> ();
    }

    template <char _C, typename... T, typename... _T1, typename... _T2, typename... _T3>
    static RepeatedExpression<T...>
    _do_build(TypeList<T...>,
              TypeList<LabelName<_C>, _T1...>, 
              TypeList<LabelName<_C>, _T2...>,
              TypeList<_T3...>)

    {
        return RepeatedExpression<T...> ();
    }

    template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2, typename... _T3>
    static auto
    _do_build(TypeList<T...>,
              TypeList<LabelName<_C1>, _T1...>, 
              TypeList<LabelName<_C2>, _T2...>,
              TypeList<_T3...>)
    -> decltype(_do_build(TypeList<T...>(),
                          TypeList<LabelName<_C1>, _T1...>(),
                          TypeList<_T2...>(),
                          TypeList<_T3..., LabelName<_C2>>()))
    {
        return _do_build(TypeList<T...>(),
                         TypeList<LabelName<_C1>, _T1...>(),
                         TypeList<_T2...>(),
                         TypeList<_T3..., LabelName<_C2>>());
    }

    template <char _C1, char _C2, typename... T, typename... _T1, typename... _T2>
    static auto
    _do_build(TypeList<T...>,
              TypeList<LabelName<_C1>, LabelName<_C2>, _T1...>, 
              TypeList<>,
              TypeList<LabelName<_C2>, _T2...>)
    -> decltype(_do_build(TypeList<T...>(),
                          TypeList<LabelName<_C2>, _T1...>(),
                          TypeList<_T2...>(),
                          TypeList<>()))
    {
        return _do_build(TypeList<T...>(),
                         TypeList<LabelName<_C2>, _T1...>(),
                         TypeList<_T2...>(),
                         TypeList<>());
    }

public:
    template <char C, typename... T>
    static auto
    build_expression(LabelName<C>, T...)
    -> decltype(_do_build(TypeList<LabelName<C>, T...>(),
                          TypeList<LabelName<C>, T...>(),
                          TypeList<T...>(),
                          TypeList<>()))
    {
        return _do_build(TypeList<LabelName<C>, T...>(),
                         TypeList<LabelName<C>, T...>(),
                         TypeList<T...>(),
                         TypeList<>());
    }
};

The factory could be called in the program like so: (in the actual program there is another class with an overloaded operator() which calls the factory)

int main()
{
    LabelName<'a'> a;
    LabelName<'b'> b;
    ...
    LabelName<'j'> j;

    auto expr = ExpressionFactory::build_expression(a,b,c,d,e,f,g,h,i,j);

    // Perhaps do some cool stuff with expr

    return 0;
}

The above code works as intended, and is correctly compiled by both GCC and the Intel compiler. Now, I understand that the compiler would take more time to perform recursive template deduction as I crank up the number of labels I use.

On my computer, if build_expression is called with one argument, then GCC 4.7.1 takes around 0.26 second to compile on average. The compile time scales up to around 0.29 second for five arguments, and to 0.62 second for ten arguments. This is all perfectly reasonable.

The story is quite different with the Intel compiler. ICPC 13.0.1 compiles the one-argument code in 0.35 second, and the compile time stays pretty much constant for up to four arguments. At five arguments the compile time goes up to 12 seconds, and at six arguments it shoots up above 9600 seconds (that is, over 2 hours and 40 minutes). Needless to say, I haven't waited long enough to find out how long it takes to compile the seven-argument version.


Two questions immediately come to mind:

  • Is the Intel compiler particularly known to be slow to compile recursive decltype?

  • Is there any way to rewrite this code to achieve the same effect in a way that is perhaps friendlier to the compiler?

like image 553
Saran Tunyasuvunakool Avatar asked Nov 11 '12 10:11

Saran Tunyasuvunakool


People also ask

What is the fastest compiler for programming?

Notice that the fastest compiler is assembler. (Assembler translates processor instructions written as text into numbers.) Hence adapting a programming language to that criterion makes little sense. Rather they need to reduce compile time by other means like switching off optimizations (mentioned above).

Why do compiler optimisation passes limit the number of complexity passes?

Production compilers try to limit their optimisation passes to better-than-quadratic complexity these days to avoid blowing up compile time but there's still a lot of work to do. The more performance-critical the use case for your language is, the more you'd want the compiler to optimise.

Is go faster to compile than other languages?

I found that go is incredibly fast to compile as a compiler language compared to other languages like c++ or rust. Almost as fast as running an interpreted language, I think. Haskell is slower than go to compile even though it is like go, compiled with a runtime and garbage collector (am I correct?).

Why is Haskell so much slower than go?

Haskell is slower than go to compile even though it is like go, compiled with a runtime and garbage collector (am I correct?). I suspect that the complexity of type system is the main cause, since Haskell has more complex type system than go.


1 Answers

Here is a stab at it. Instead of doing pairwise comparisons of each of the elements, I sort the list of types, then use a brain-dead unique algorithm to see if there are any duplicates.

I implemented merge sort on types, because it was fun. Probably a naive bubble sort would work better on reasonable number of arguments. Note that a bit of work would allow us to do a merge sort on long lists, and specialize for bubble sorts (or even insertion sorts) on short lists. I'm not up for writing a template quicksort.

This gives me a compile time boolean that says if there are duplicates in the list. I can then use enable_if to pick which overload I'm going to use.

Note that your solution involved n^2 layers of template recursion, at each stage of which the return type requires evaluating the type of a 1 step simpler class, and then the type returned also requires the same! If the Intel compiler memoization fails, you are talking exponential amounts of work.

I augmented a few of your classes with some helpers. I made your LabelNames inherit from std::integral_constant, so I have easy compile time access to their value. This makes the sorting code a tad easier. I also exposed an enum from the repeated and unique return values so I can do simple printf debugging on the result.

Most of this work is writing the merge sort -- is there a standard compile time type sort we could use?

#include <type_traits>
#include <iostream>

template <typename... T> struct TypeList { };

// NOTE THIS CHANGE:
template <char C> struct LabelName:std::integral_constant<char, C> {};

template <typename... T> class UniqueExpression
{
    // Contains implementation details in actual code
public:
  enum { is_unique = true };
};

template <typename... T> class RepeatedExpression
{
    // Contains implementation details in actual code
public:
  enum { is_unique = false };
};

// A compile time merge sort for types
// Split takes a TypeList<>, and sticks the even
// index types into Left and odd into Right
template<typename T>
struct Split;
template<>
struct Split<TypeList<>>
{
  typedef TypeList<> Left;
  typedef TypeList<> Right;
};
template<typename T>
struct Split<TypeList<T>>
{
  typedef TypeList<T> Left;
  typedef TypeList<> Right;
};

// Prepends First into the TypeList List.
template<typename First, typename List>
struct Prepend;
template<typename First, typename... ListContents>
struct Prepend<First,TypeList<ListContents...>>
{
  typedef TypeList<First, ListContents...> type;
};

template<typename First, typename Second, typename... Tail>
struct Split<TypeList<First, Second, Tail...>>
{
  typedef typename Prepend< First, typename Split<TypeList<Tail...>>::Left>::type Left;
  typedef typename Prepend< Second, typename Split<TypeList<Tail...>>::Right>::type Right;
};

// Merges the sorted TypeList<>s Left and Right to the end of TypeList<> MergeList
template< typename Left, typename Right, typename MergedList=TypeList<> >
struct Merge;
template<typename MergedList>
struct Merge< TypeList<>, TypeList<>, MergedList >
{
  typedef MergedList type;
};
template<typename L1, typename... Left, typename... Merged>
struct Merge< TypeList<L1, Left...>, TypeList<>, TypeList<Merged... >>
{
  typedef TypeList<Merged..., L1, Left...> type;
};
template<typename R1, typename... Right, typename... Merged>
struct Merge< TypeList<>, TypeList<R1, Right...>, TypeList<Merged...> >
{
  typedef TypeList<Merged..., R1, Right...> type;
};
template<typename L1, typename... Left, typename R1, typename... Right, typename... Merged>
struct Merge< TypeList<L1, Left...>, TypeList<R1, Right...>, TypeList<Merged...>>
{
  template<bool LeftIsSmaller, typename LeftList, typename RightList, typename MergedList>
  struct MergeHelper;

  template<typename FirstLeft, typename... LeftTail, typename FirstRight, typename... RightTail, typename... MergedElements>
  struct MergeHelper< true, TypeList<FirstLeft, LeftTail...>, TypeList<FirstRight, RightTail...>, TypeList<MergedElements...> >
  {
    typedef typename Merge< TypeList<LeftTail...>, TypeList< FirstRight, RightTail... >, TypeList< MergedElements..., FirstLeft > >::type type;
  };
  template<typename FirstLeft, typename... LeftTail, typename FirstRight, typename... RightTail, typename... MergedElements>
  struct MergeHelper< false, TypeList<FirstLeft, LeftTail...>, TypeList<FirstRight, RightTail...>, TypeList<MergedElements...> >
  {
    typedef typename Merge< TypeList<FirstLeft, LeftTail...>, TypeList<RightTail... >, TypeList< MergedElements..., FirstRight > >::type type;
  };

  typedef typename MergeHelper< (L1::value < R1::value), TypeList<L1, Left...>, TypeList<R1, Right...>, TypeList<Merged...> >::type type;
};

// Takes a TypeList<T...> and sorts it via a merge sort:
template<typename List>
struct MergeSort;
template<>
struct MergeSort<TypeList<>>
{
  typedef TypeList<> type;
};
template<typename T>
struct MergeSort<TypeList<T>>
{
  typedef TypeList<T> type;
};
template<typename First, typename Second, typename... T>
struct MergeSort<TypeList<First, Second, T...>>
{
  typedef Split<TypeList<First, Second, T...>> InitialSplit;
  typedef typename MergeSort< typename InitialSplit::Left >::type Left;
  typedef typename MergeSort< typename InitialSplit::Right >::type Right;
  typedef typename Merge< Left, Right >::type type;
};

// Sorts a TypeList<T..>:
template<typename List>
struct Sort: MergeSort<List> {};

// Checks sorted TypeList<T...> SortedList for adjacent duplicate types
// return value is in value
template<typename SortedList>
struct Unique;

template<> struct Unique< TypeList<> >:std::true_type {};
template<typename T> struct Unique< TypeList<T> >:std::true_type {};

template<typename First, typename Second, typename... Tail>
struct Unique< TypeList< First, Second, Tail... > >
{
  enum
  {
    value = (!std::is_same<First, Second>::value) &&
      Unique< TypeList<Second, Tail...> >::value
  };
};

// value is true iff there is a repeated type in Types...
template<typename... Types>
struct RepeatedType
{
  typedef TypeList<Types...> MyListOfTypes;

  typedef typename Sort< MyListOfTypes >::type MyListOfTypesSorted;
  enum
  {
    value = !Unique< MyListOfTypesSorted >::value
  };
};

// A struct that creates an rvalue trivial constructed type
// of any type requested.
struct ProduceRequestedType
{
  template<typename Result>
  operator Result() { return Result(); };
};

struct ExpressionFactory {
  template<typename... T>
  typename std::enable_if<
    !RepeatedType<T...>::value,
    UniqueExpression<T...>
  >::type
  build_expression(T...) const
  {
    return ProduceRequestedType();
  };
  template<typename... T>
  typename std::enable_if<
    RepeatedType<T...>::value,
    RepeatedExpression<T...>
  >::type
  build_expression(T...) const
  {
    return ProduceRequestedType();
  };
};

// Simple testing code for above:
int main()
{
  auto foo1 = ExpressionFactory().build_expression( LabelName<'a'>(), LabelName<'b'>(), LabelName<'a'>() );
  typedef decltype(foo1) foo1Type;
  if (foo1Type::is_unique)
    std::cout << "foo1 is unique\n";
  else
    std::cout << "foo1 is repeated\n";

  auto foo2 = ExpressionFactory().build_expression( LabelName<'q'>(), LabelName<'a'>(), LabelName<'b'>(), LabelName<'d'>(), LabelName<'t'>(), LabelName<'z'>() );
  typedef decltype(foo2) foo2Type;
  if (foo2Type::is_unique)
    std::cout << "foo2 is unique\n";
  else
    std::cout << "foo2 is repeated\n";
}

In addition I'd like to add a critique of your code: Template programming is programming -- your typenames are the equivalent of using "i1" through "i9" for integer variables in a function. Give your typenames meaningful names when doing something non-trivial.

How does this compile on Intel?

like image 50
Yakk - Adam Nevraumont Avatar answered Sep 28 '22 03:09

Yakk - Adam Nevraumont