Why does it help to assign a const& scalar value to a const before a loop?

Question

In GCC 5.4.0's stl_algobase.h we have:

  template<typename _ForwardIterator, typename _Tp>
    inline typename
    __gnu_cxx::__enable_if<!__is_scalar<_Tp>::__value, void>::__type
    __fill_a(_ForwardIterator __first, _ForwardIterator __last,
         const _Tp& __value)
    {
      for (; __first != __last; ++__first)
    *__first = __value;
    }

  template<typename _ForwardIterator, typename _Tp>
    inline typename
    __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type
    __fill_a(_ForwardIterator __first, _ForwardIterator __last,
         const _Tp& __value)
    {
      const _Tp __tmp = __value;
      for (; __first != __last; ++__first)
    *__first = __tmp;
    }

I don't understand why the variant for scalars has any advantage over the general variant. I mean, won't they be compiled into exactly the same thing? Loading the __value from the stack into a register and using that register throughout the loop?

Zeta · Accepted Answer

This originated in SVN rev 83645 (git commit 8ba26e53) in 2004, when both __fill_a variants where implemented as helper structs:

template<typename>
struct __fill
{
  template<typename _ForwardIterator, typename _Tp>
    static void
    fill(_ForwardIterator __first, _ForwardIterator __last,
     const _Tp& __value)
    {
  for (; __first != __last; ++__first)
    *__first = __value;
}
};

template<>
struct __fill<__true_type>
{
  template<typename _ForwardIterator, typename _Tp>
    static void
    fill(_ForwardIterator __first, _ForwardIterator __last,
     const _Tp& __value)
    {
  const _Tp __tmp = __value;
  for (; __first != __last; ++__first)
    *__first = __tmp;
}
};

Documentation on this topic is sparse, but the original commit by Dan Nicolaescu and Paolo Carlini contains a hint in the commit message:

include/bits/stl_algobase.h (__fill, __fill_n): New helpers for fill and fill_n, respectively: when copying is cheap, use a temporary to avoid a memory read in each iteration.

Given that those are/were maintainers of the standard library, I think they knew what there are doing: they tackle the problem that references are usually implemented as pointers. After all, they are just a new alias to an already existing memory location. That's why there were two variants originally. Note that the __true_type was decided in the fill call:

  typedef typename __type_traits<_Tp>::has_trivial_copy_constructor
    _Trivial;
  std::__fill<_Trivial>::fill(__first, __last, __value);

With std::enable_if, or rather its GCC variant, Carlini removed those helpers and replaced them by the version you've already provided. The logic still holds: for scalar types, you want to have some local value. If your range is in another memory region than your value and spans several pages and spills your L1 cache, you don't want to keep a bit of your cache locked down for that value. And it's trivial with a local variable.

However, the semantics are important. std::fill generates exactly std::distance(first, last) copies. With scalar values, we know that an additional copy won't have a side-effect. With user defined types? Well, we don't know. And that's why you cannot use the const auto tmp = __value; variant in the first case.

And that's why you end up with two, well, actually three variants:

one for scalar values were you can keep the value "close" and help the optimizer
one for byte-like values, where you can use memset
one for all the other types, where you cannot meddle with the semantics.

Why does it help to assign a const& scalar value to a const before a loop?

Tags:

c++

gcc

temporary

stl-algorithm

gcc5

einpoklum

1 Answers

Zeta

Recent Activity

Donate For Us

Why does it help to assign a const& scalar value to a const before a loop?

Tags:

c++

gcc

temporary

stl-algorithm

gcc5

einpoklum

1 Answers

Zeta

Related questions

Recent Activity

Donate For Us