In GCC 5.4.0's stl_algobase.h
we have:
template<typename _ForwardIterator, typename _Tp>
inline typename
__gnu_cxx::__enable_if<!__is_scalar<_Tp>::__value, void>::__type
__fill_a(_ForwardIterator __first, _ForwardIterator __last,
const _Tp& __value)
{
for (; __first != __last; ++__first)
*__first = __value;
}
template<typename _ForwardIterator, typename _Tp>
inline typename
__gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type
__fill_a(_ForwardIterator __first, _ForwardIterator __last,
const _Tp& __value)
{
const _Tp __tmp = __value;
for (; __first != __last; ++__first)
*__first = __tmp;
}
I don't understand why the variant for scalars has any advantage over the general variant. I mean, won't they be compiled into exactly the same thing? Loading the __value
from the stack into a register and using that register throughout the loop?
This originated in SVN rev 83645 (git commit 8ba26e53) in 2004, when both __fill_a
variants where implemented as helper structs:
template<typename>
struct __fill
{
template<typename _ForwardIterator, typename _Tp>
static void
fill(_ForwardIterator __first, _ForwardIterator __last,
const _Tp& __value)
{
for (; __first != __last; ++__first)
*__first = __value;
}
};
template<>
struct __fill<__true_type>
{
template<typename _ForwardIterator, typename _Tp>
static void
fill(_ForwardIterator __first, _ForwardIterator __last,
const _Tp& __value)
{
const _Tp __tmp = __value;
for (; __first != __last; ++__first)
*__first = __tmp;
}
};
Documentation on this topic is sparse, but the original commit by Dan Nicolaescu and Paolo Carlini contains a hint in the commit message:
- include/bits/stl_algobase.h (__fill, __fill_n): New helpers for fill and fill_n, respectively: when copying is cheap, use a temporary to avoid a memory read in each iteration.
Given that those are/were maintainers of the standard library, I think they knew what there are doing: they tackle the problem that references are usually implemented as pointers. After all, they are just a new alias to an already existing memory location. That's why there were two variants originally. Note that the __true_type
was decided in the fill
call:
typedef typename __type_traits<_Tp>::has_trivial_copy_constructor
_Trivial;
std::__fill<_Trivial>::fill(__first, __last, __value);
With std::enable_if
, or rather its GCC variant, Carlini removed those helpers and replaced them by the version you've already provided. The logic still holds: for scalar types, you want to have some local value. If your range is in another memory region than your value and spans several pages and spills your L1 cache, you don't want to keep a bit of your cache locked down for that value. And it's trivial with a local variable.
However, the semantics are important. std::fill
generates exactly std::distance(first, last)
copies. With scalar values, we know that an additional copy won't have a side-effect. With user defined types? Well, we don't know. And that's why you cannot use the const auto tmp = __value;
variant in the first case.
And that's why you end up with two, well, actually three variants:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With