Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does passing a `unique_ptr` by value have a performance penalty compared to a plain pointer?

Common wisdom is that std::unique_ptr does not introduce a performance penalty (and not a memory penalty when not using a deleter parameter), but I recently stumbled over a discussion showing that it actually introduces an additional indirection because the unique_ptr cannot be passed in a register on platforms with Itanium ABI. The example posted was similar to

#include <memory>

int foo(std::unique_ptr<int> u) {
    return *u;
}

int boo(int* i) {
    return *i;
}

Which generates an additional assembler instruction in foo compared to boo.

foo(std::unique_ptr<int, std::default_delete<int> >):
        mov     rax, QWORD PTR [rdi]
        mov     eax, DWORD PTR [rax]
        ret
boo(int*):
        mov     eax, DWORD PTR [rdi]
        ret

The explanation was that the Itanium ABI demands that the unique_ptr shall not be passed in a register because of the non-trivial constructor, so it created on the stack and then the address of this object is passed in a register.

I know that this does not really impact performance on a modern PC platform, but I am wondering if somebody could provide more details on the reasons why it shall not be copied to a register. Since zero-cost abstractions are one of the major goals of C++, I am wondering if this has been discussed in the standardization process as an accepted deviation or if it is a quality of implementation issue. The performance penalty is certainly small enough when considering the benefits, especially on modern PC platforms.

Commenters have pointed out that the two functions are not fully equivalent and thus the comparison is flawed since foo will also call the deleter on the unique_ptr parameter but boo does not release the memory. However, I was only interested in the difference resulting from passing a unique_ptr by-value compared to passing a plain pointer. I've modified the example code and included a call to delete to free the plain pointer; the call is in the caller because the unique_ptr's deleter also gets called in the caller's context to make the generated code more identical. In addition, the manual delete also checks ptr != nullptr because the destructor also does this. Still, foo does not pass the parameter in a register and has to do an indirect access.

I also wonder why the compiler does not elide the check for nullptr before calling operator delete since this is defined to be a noop anyway. I guess that unique_ptr could be specialized for the default deleter to not perform the check in the destructor, but that would be a very small micro-optimization.

like image 417
Jens Avatar asked Jan 16 '19 21:01

Jens


1 Answers

System V ABI uses Itanium C++ ABI and refers to it. In particular, C++ Itanium ABI specifies that

If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.

Specifically:

...

If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception), at the end of enclosing full-expression.

So a simple answer to question "why it is not passed into register" is "because it can't".

Now, an interesting question might be 'why did C++ Itanium ABI decided to go with that'.

While I wouldn't claim that I have intimate knowledge with rationale, two things come to mind:

  • This allows for copy elision if the argument to the function is a temporary
  • This makes tail-call optimizations more powerful. If callee would need to call destructors of it's arguments, TCO wouldn't be possible for any function which accepts non-trivial arguments.
like image 60
SergeyA Avatar answered Oct 19 '22 15:10

SergeyA