Difference in performance: std::accumulate vs std::inner_product vs Loop

Tags:

Today, I want to share something that was blowing my mind when I tried to implement this simple operation:

enter image description here

I found different ways to perform the same operation:

By using the std::inner_product.
Implementing a predicate and using the std::accumulate function.
Using a loop in C style.

I wanted to perform some benchmark by using Quick Bench and enabling all the optimizations.

First of all, I compared the two C++ alternatives with floating values. This is the code used by using std::accumulate:

const auto predicate = [](const double previous, const double current) {
    return previous + current * current;
};
const auto result = std::accumulate(input.cbegin(), input.cend(), 0, predicate);

Versus this code by using the std::inner_product functionality:

const auto result = std::inner_product(input.cbegin(), input.cend(), input.cbegin(), 1);

After running the benchmark with all the optimization enabled, I got this result:

enter image description here

Both algorithms seem to reach the same performance. I did want to go further and try the C implementation:

double result = 0;
for (auto i = 0; i < input.size(); ++i) {
  result += input[i] * input[i];
}

And surprisingly, I found:

enter image description here

I was not expecting this result. I was sure there is something wrong so I did check the GCC implementation:

template<typename _InputIterator1, typename _InputIterator2, typename _Tp>
inline _Tp
inner_product(_InputIterator1 __first1, _InputIterator1 __last1,
      _InputIterator2 __first2, _Tp __init)
{
  // concept requirements
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator1>)
  __glibcxx_function_requires(_InputIteratorConcept<_InputIterator2>)
  __glibcxx_requires_valid_range(__first1, __last1);

  for (; __first1 != __last1; ++__first1, (void)++__first2)
__init = __init + (*__first1 * *__first2);
  return __init;
}

I found that It was doing the same as the C implementation. After reviewing the implementation, I discovered something weird, (or at least I was not expecting to have that significant impact): in all the internal accumulations, it was doing a cast from the iterator value_type to the type of the initial value.

In my case, I was initializing the initial values to 0 or 1, the values were considered integers and in each accumulation, the compiler was doing the casting. In the different test cases, my input array stores truncated floating points, so the result did not change.

After updating the initial value to a double type:

const auto result = std::accumulate(input.cbegin(), input.cend(), 0.0, predicate);

And:

const auto result = std::inner_product(input.cbegin(), input.cend(), input.cbegin(), 0.0);

I got the expected result:

enter image description here

Now, I understand that leaving the initial value to be an independent type from the underlying type of the iterator may make the function more flexible and allow to do more things. But,

If I am accumulating elements of an array, I am expecting to get the same type as a result. Same for the inner product.

Should it be the default behaviour?

Why did the standard decide to perform it in this way?

670

asked Sep 04 '18 13:09

mohabouje

1 Answers

If I am accumulating elements of an array, I am expecting to get the same type as a result.

Your expectation is wrong (though it is not quite clear what "same type as result" means), as you can clearly see from std::accumulate documentation:

template< class InputIt, class T >
T accumulate( InputIt first, InputIt last, T init );

template< class InputIt, class T, class BinaryOperation >
T accumulate( InputIt first, InputIt last, T init,
              BinaryOperation op );

return type is exactly the same type you use for initial value. The same effect you can have on the loop:

auto result = 0; // vs auto result = 0.0;
for (auto i = 0; i < input.size(); ++i) {
  result += input[i] * input[i];
}

Why did the standard decide to perform it in this way?

Because this way you can decide what type you use to aggregate. Note std::accumulate can be used for left fold and cases when T not equal to std::iterator_traits<InputIt>::value_type not less often (probably even more) than when they match.

132

answered Sep 22 '22 14:09

Slava

Related questions
                            
                                Replace a chain of image blurs with one blur
                            
                                Simultaneous parameter pack expansion error for unused template type definition
                            
                                How do I delay the instantiation of a static data member in Visual C++?
                            
                                How to tell clang-format to keep whitespace between binary operators as they are
                            
                                Why does this simple lambda consistently run faster inside an std::thread than inside the main function with gcc 4.9.2?
                            
                                Dispatching r-values and l-values differently and using sfinae to disable one option
                            
                                Extreme slow-down when starting at second permutation
                            
                                C++ inline functions: declare as such, define as such, or both? Why?
                            
                                Reasonable snprintf-like alternatives to strftime?
                            
                                Disabling an annoying debugger notification in Visual Studio 2017
                            
                                Regarding friend function definition and namespace scopes
                            
                                Annoying error message: cannot merge previous GCDA file
                            
                                How do I allocate memory-aligned C++ object arrays? [duplicate]
                            
                                Return result invisibly
                            
                                Visual Studio adding truly "global" default include path
                            
                                Returning const value from arithmetic operator overload with move assignment
                            
                                Is invocable and ambiguous call: bug in either g++ or clang
                            
                                Ambiguous constructor taking std::reference_wrapper when compiling with -pedantic
                            
                                How to tell if a translation unit is being compiled with segmented stacks
                            
                                Eigen + MKL slower than Matlab for matrix multiplication

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference in performance: std::accumulate vs std::inner_product vs Loop

Tags:

c++

language-lawyer

profiling

stl

c++14

mohabouje

People also ask

1 Answers

Slava

Recent Activity

Donate For Us