I played around with Godbolt's CompilerExplorer. I wanted to see how good certain optimizations are. My minimum working example is: <pre class="prettyprint"><code>#include <vector> int foo() { std::vector<int> v {1, 2, 3, 4, 5}; return v[4]; } </code></pre> The generated assembler (by clang 5.0.0, -O2 -std=c++14): <pre class="prettyprint"><code>foo(): # @foo() push rax mov edi, 20 call operator new(unsigned long) mov rdi, rax call operator delete(void*) mov eax, 5 pop rcx ret </code></pre> As one can see, clang knows the answer, but does quite a lot of stuff before returning. It seems to my that even the vector is created, because of "operator new/delete". Can anyone explain to me what happens here and why it does not just return? The code generated by GCC (not copied here) seems to construct the vector explicitly. Does anyone know GCC is not capable to deduce the result?

<code>std::vector<T></code> is a fairly complicated class that involves dynamic allocation. While <code>clang++</code> is sometimes able to elide heap allocations, it is a fairly tricky optimization and you should not rely on it. Example: <pre class="prettyprint"><code>int foo() { int* p = new int{5}; return *p; } </code></pre> <blockquote> <pre class="prettyprint"><code>foo(): # @foo() mov eax, 5 ret </code></pre> </blockquote> <hr> As an example, using <code>std::array<T></code> (which does not dynamically allocate) produces fully-inlined code: <pre class="prettyprint"><code>#include <array> int foo() { std::array v{1, 2, 3, 4, 5}; return v[4]; } </code></pre> <blockquote> <pre class="prettyprint"><code>foo(): # @foo() mov eax, 5 ret </code></pre> </blockquote> <hr> As Marc Glisse noted in the other answer's comments, this is what the Standard says in [expr.new] #10: <blockquote> An implementation is allowed to omit a call to a replaceable global allocation function ([new.delete.single], [new.delete.array]). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression. The implementation may extend the allocation of a new-expression e1 to provide storage for a new-expression e2 if the following would be true were the allocation not extended: [...] </blockquote>

As the comments note, <code>operator new</code> can be replaced. This can happen in any Translation Unit. Optimizing a program for the case it's not replaced therefore requires Whole-Program Analysis. And if it is replaced, you have to call it of course. Whether the default <code>operator new</code> is a library I/O call is unspecified. That matters, because library I/O calls are observable and therefore they can't be optimized out either.

N3664's change to [expr.new], cited in one answer and one comment, permits new-expressions to not call a replaceable global allocation function. But <code>vector</code> allocates memory using <code>std::allocator<T>::allocate</code>, which calls <code>::operator new</code> directly, not via a new-expression. So that special permission doesn't apply, and generally compilers cannot elide such direct calls to <code>::operator new</code>. All hope is not lost, however, for <code>std::allocator<T>::allocate</code>'s specification has this to say: <blockquote> Remarks: the storage is obtained by calling <code>::operator new</code>, but it is unspecified when or how often this function is called. </blockquote> Leveraging this permission, libc++'s <code>std::allocator</code> uses special clang built-ins to indicate to the compiler that elision is permitted. With <code>-stdlib=libc++</code>, clang compiles your code down to <pre class="prettyprint"><code>foo(): # @foo() mov eax, 5 ret </code></pre>

Why isn't this unused variable optimised away?

Tags:

c++

compiler-optimization

gcc

clang

I played around with Godbolt's CompilerExplorer. I wanted to see how good certain optimizations are. My minimum working example is:

#include <vector>

int foo() {
    std::vector<int> v {1, 2, 3, 4, 5};
    return v[4];
}

The generated assembler (by clang 5.0.0, -O2 -std=c++14):

foo(): # @foo()
  push rax
  mov edi, 20
  call operator new(unsigned long)
  mov rdi, rax
  call operator delete(void*)
  mov eax, 5
  pop rcx
  ret

As one can see, clang knows the answer, but does quite a lot of stuff before returning. It seems to my that even the vector is created, because of "operator new/delete".

Can anyone explain to me what happens here and why it does not just return?

The code generated by GCC (not copied here) seems to construct the vector explicitly. Does anyone know GCC is not capable to deduce the result?

498

asked Nov 02 '17 09:11

Max Görner

3 Answers

std::vector<T> is a fairly complicated class that involves dynamic allocation. While clang++ is sometimes able to elide heap allocations, it is a fairly tricky optimization and you should not rely on it. Example:

int foo() {
    int* p = new int{5};
    return *p;
}

foo():                                # @foo()
        mov     eax, 5
        ret

As an example, using std::array<T> (which does not dynamically allocate) produces fully-inlined code:

#include <array>

int foo() {
    std::array v{1, 2, 3, 4, 5};
    return v[4];
}

foo():                                # @foo()
        mov     eax, 5
        ret

As Marc Glisse noted in the other answer's comments, this is what the Standard says in [expr.new] #10:

An implementation is allowed to omit a call to a replaceable global allocation function ([new.delete.single], [new.delete.array]). When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression. The implementation may extend the allocation of a new-expression e1 to provide storage for a new-expression e2 if the following would be true were the allocation not extended: [...]

answered Oct 18 '22 21:10

Vittorio Romeo

As the comments note, operator new can be replaced. This can happen in any Translation Unit. Optimizing a program for the case it's not replaced therefore requires Whole-Program Analysis. And if it is replaced, you have to call it of course.

Whether the default operator new is a library I/O call is unspecified. That matters, because library I/O calls are observable and therefore they can't be optimized out either.

answered Oct 18 '22 20:10

MSalters

N3664's change to [expr.new], cited in one answer and one comment, permits new-expressions to not call a replaceable global allocation function. But vector allocates memory using std::allocator<T>::allocate, which calls ::operator new directly, not via a new-expression. So that special permission doesn't apply, and generally compilers cannot elide such direct calls to ::operator new.

All hope is not lost, however, for std::allocator<T>::allocate's specification has this to say:

Remarks: the storage is obtained by calling ::operator new, but it is unspecified when or how often this function is called.

Leveraging this permission, libc++'s std::allocator uses special clang built-ins to indicate to the compiler that elision is permitted. With -stdlib=libc++, clang compiles your code down to

foo():                                # @foo()
        mov     eax, 5
        ret

answered Oct 18 '22 20:10

T.C.

Related questions
                            
                                How to call C++ from Java?
                            
                                Equality-compare std::weak_ptr
                            
                                How does CMake choose gcc and g++ for compiling?
                            
                                Is it possible to read a file at compile time?
                            
                                WinAPI: Create a window with a specified client area size
                            
                                Specifying a lambda function as default argument
                            
                                How to determine the value of socket listen() backlog parameter?
                            
                                Is C++11's long long really at least 64 bits?
                            
                                Eigen: Coding style's effect on performance
                            
                                Which headers in the C++ standard library are guaranteed to include another header?
                            
                                CMake: failed to run MSBuild command: MSBuild.exe
                            
                                Constant integers and constant evaluation
                            
                                Sample from multivariate normal/Gaussian distribution in C++
                            
                                What does the tilde (~) in macros mean?
                            
                                What happens to malloc'ed memory after exec() changes the program image?
                            
                                Circumventing template specialization
                            
                                Is specializing std::swap deprecated now that we have move semantics? [duplicate]
                            
                                What exactly is a 'side-effect' in C++?
                            
                                std::unordered_map::find using a type different than the Key type?
                            
                                Why can't a destructor be marked constexpr?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With