I am curious why the following piece of code:
#include <string>
int main()
{
std::string a = "ABCDEFGHIJKLMNO";
}
when compiled with -O3
yields the following code:
main: # @main
xor eax, eax
ret
(I perfectly understand that there is no need for the unused a
so the compiler can entirely omit it from the generated code)
However the following program:
#include <string>
int main()
{
std::string a = "ABCDEFGHIJKLMNOP"; // <-- !!! One Extra P
}
yields:
main: # @main
push rbx
sub rsp, 48
lea rbx, [rsp + 32]
mov qword ptr [rsp + 16], rbx
mov qword ptr [rsp + 8], 16
lea rdi, [rsp + 16]
lea rsi, [rsp + 8]
xor edx, edx
call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_create(unsigned long&, unsigned long)
mov qword ptr [rsp + 16], rax
mov rcx, qword ptr [rsp + 8]
mov qword ptr [rsp + 32], rcx
movups xmm0, xmmword ptr [rip + .L.str]
movups xmmword ptr [rax], xmm0
mov qword ptr [rsp + 24], rcx
mov rax, qword ptr [rsp + 16]
mov byte ptr [rax + rcx], 0
mov rdi, qword ptr [rsp + 16]
cmp rdi, rbx
je .LBB0_3
call operator delete(void*)
.LBB0_3:
xor eax, eax
add rsp, 48
pop rbx
ret
mov rdi, rax
call _Unwind_Resume
.L.str:
.asciz "ABCDEFGHIJKLMNOP"
when compiled with the same -O3
. I don't understand why it does not recognize that the a
is still unused, regardless that the string is one byte longer.
This question is relevant to gcc 9.1 and clang 8.0, (online: https://gcc.godbolt.org/z/p1Z8Ns) because other compilers in my observation either entirely drop the unused variable (ellcc) or generate code for it regardless the length of the string.
Use the command-line option -O0 (-[capital o][zero]) to disable optimization, and -S to get assembly file. Look here to see more gcc command-line options.
Compiler optimization is generally implemented using a sequence of optimizing transformations, algorithms which take a program and transform it to produce a semantically equivalent output program that uses fewer resources or executes faster.
The C/C++ compiler compiles each source file separately and produces the corresponding object file. This means the compiler can only apply optimizations on a single source file rather than on the whole program. However, some important optimizations can be performed only by looking at the whole program.
Small String Optimizations Given that the size of a std::string is 24 bytes on a 64-bits platform (to store data pointer, size and capacity), some very cool tricks allow us to store statically up to 23 bytes before you need to allocate memory. That has a huge impact in terms of performance!
This is due to the small string optimization. When the string data is less than or equal 16 characters, including the null terminator, it is stored in a buffer local to the std::string
object itself. Otherwise, it allocates memory on the heap and stores the data over there.
The first string "ABCDEFGHIJKLMNO"
plus the null terminator is exactly of size 16. Adding "P"
makes it exceed the buffer, hence new
is being called internally, inevitably leading to a system call. The compiler can optimize something away if it's possible to ensure that there are no side effects. A system call probably makes it impossible to do this - by constrast, changing a buffer local to the object under construction allows for such a side effect analysis.
Tracing the local buffer in libstdc++, version 9.1, reveals these parts of bits/basic_string.h
:
template<typename _CharT, typename _Traits, typename _Alloc> class basic_string { // ... enum { _S_local_capacity = 15 / sizeof(_CharT) }; union { _CharT _M_local_buf[_S_local_capacity + 1]; size_type _M_allocated_capacity; }; // ... };
which lets you spot the local buffer size _S_local_capacity
and the local buffer itself (_M_local_buf
). When the constructor triggers basic_string::_M_construct
being called, you have in bits/basic_string.tcc
:
void _M_construct(_InIterator __beg, _InIterator __end, ...) { size_type __len = 0; size_type __capacity = size_type(_S_local_capacity); while (__beg != __end && __len < __capacity) { _M_data()[__len++] = *__beg; ++__beg; }
where the local buffer is filled with its content. Right after this part, we get to the branch where the local capacity is exhausted - new storage is allocated (through the allocate in M_create
), the local buffer is copied into the new storage and filled with the rest of the initializing argument:
while (__beg != __end) { if (__len == __capacity) { // Allocate more space. __capacity = __len + 1; pointer __another = _M_create(__capacity, __len); this->_S_copy(__another, _M_data(), __len); _M_dispose(); _M_data(__another); _M_capacity(__capacity); } _M_data()[__len++] = *__beg; ++__beg; }
As a side note, small string optimization is quite a topic on its own. To get a feeling for how tweaking individual bits can make a difference at large scale, I'd recommend this talk. It also mentions how the std::string
implementation that ships with gcc
(libstdc++) works and changed during the past to match newer versions of the standard.
I was surprised the compiler saw through a std::string
constructor/destructor pair until I saw your second example. It didn't. What you're seeing here is small string optimization and corresponding optimizations from the compiler around that.
Small string optimizations are when the std::string
object itself is big enough to hold the contents of the string, a size and possibly a discriminating bit used to indicate whether the string is operating in small or big string mode. In such a case, no dynamic allocations occur and the string is stored in the std::string
object itself.
Compilers are really bad at eliding unneeded allocations and deallocations, they are treated almost as if having side effects and are thus impossible to elide. When you go over the small string optimization threshold, dynamic allocations occur and the result is what you see.
As an example
void foo() {
delete new int;
}
is the simplest, dumbest allocation/deallocation pair possible, yet gcc emits this assembly even under O3
sub rsp, 8
mov edi, 4
call operator new(unsigned long)
mov esi, 4
add rsp, 8
mov rdi, rax
jmp operator delete(void*, unsigned long)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With