Assume that you have to implement a static_vector<T, N>
class, which is a fixed capacity container that entirely lives on the stack and never allocates, and exposes an std::vector
-like interface. (Boost provides boost::static_vector
.)
Considering that we must have uninitialized storage for maximum N
instances of T
, there are multiple choices that can be made when designing the internal data layout:
Single-member union
:
union U { T _x; };
std::array<U, N> _data;
Single std::aligned_storage_t
:
std::aligned_storage_t<sizeof(T) * N, alignof(T)> _data;
Array of std::aligned_storage_t
:
using storage = std::aligned_storage_t<sizeof(T), alignof(T)>;
std::array<storage, N> _data;
Regardless of the choice, creating the members will require the use of "placement new
" and accessing them will require something along the lines of reinterpret_cast
.
Now assume that we have two very minimal implementations of static_vector<T, N>
:
with_union
: implemented using the "single-member union
" approach;
with_storage
: implemented using the "single std::aligned_storage_t
" approach.
Let's perform the following benchmark using both g++
and clang++
with -O3
. I used quick-bench.com for this task:
void escape(void* p) { asm volatile("" : : "g"(p) : "memory"); }
void clobber() { asm volatile("" : : : "memory"); }
template <typename Vector>
void test()
{
for(std::size_t j = 0; j < 10; ++j)
{
clobber();
Vector v;
for(int i = 0; i < 123456; ++i) v.emplace_back(i);
escape(&v);
}
}
(escape
and clobber
are taken from Chandler Carruth's CppCon 2015 talk: "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!")
g++ 7.2
(live here):clang++ 5.0
(live here):As you can see from the results, g++
seems to be able to aggressively optimize (vectorization) the implementation that uses the "single std::aligned_storage_t
" approach, but not the implementation using the union
.
My questions are:
Is there anything in the Standard that prevents the implementation using union
from being aggressively optimized? (I.e. does the Standard grant more freedom to the compiler when using std::aligned_storage_t
- if so, why?)
Is this purely a "quality of implementation" issue?
xskxzr is right, this is the same issue as in this question. Fundamentally, gcc is missing an optimization opportunity in forgetting that std::array
's data is aligned. John Zwinck helpfully reported bug 80561.
You can verify this in your benchmark by making one of two changes to with_union
:
Change _data
from a std::array<U, N>
to simply a U[N]
. Performance becomes identical
Remind gcc that _data
is actually aligned by changing the implementation of emplace_back()
to:
template <typename... Ts>
T& emplace_back(Ts&&... xs)
{
U* data = static_cast<U*>(__builtin_assume_aligned(_data.data(), alignof(U)));
T* ptr = &data[_size++]._x;
return *(new (ptr) T{std::forward<Ts>(xs)...});
}
Either of those changes with the rest of your benchmark gets me comparable results between WithUnion
and WithAlignedStorage
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With