Common wisdom is that std::unique_ptr
does not introduce a performance penalty (and not a memory penalty when not using a deleter parameter), but I recently stumbled over a discussion showing that it actually introduces an additional indirection because the unique_ptr
cannot be passed in a register on platforms with Itanium ABI. The example posted was similar to
#include <memory>
int foo(std::unique_ptr<int> u) {
return *u;
}
int boo(int* i) {
return *i;
}
Which generates an additional assembler instruction in foo compared to boo.
foo(std::unique_ptr<int, std::default_delete<int> >):
mov rax, QWORD PTR [rdi]
mov eax, DWORD PTR [rax]
ret
boo(int*):
mov eax, DWORD PTR [rdi]
ret
The explanation was that the Itanium ABI demands that the unique_ptr
shall not be passed in a register because of the non-trivial constructor, so it created on the stack and then the address of this object is passed in a register.
I know that this does not really impact performance on a modern PC platform, but I am wondering if somebody could provide more details on the reasons why it shall not be copied to a register. Since zero-cost abstractions are one of the major goals of C++, I am wondering if this has been discussed in the standardization process as an accepted deviation or if it is a quality of implementation issue. The performance penalty is certainly small enough when considering the benefits, especially on modern PC platforms.
Commenters have pointed out that the two functions are not fully equivalent and thus the comparison is flawed since foo
will also call the deleter on the unique_ptr
parameter but boo
does not release the memory. However, I was only interested in the difference resulting from passing a unique_ptr
by-value compared to passing a plain pointer. I've modified the example code and included a call to delete
to free the plain pointer; the call is in the caller because the unique_ptr
's deleter also gets called in the caller's context to make the generated code more identical. In addition, the manual delete
also checks ptr != nullptr
because the destructor also does this. Still, foo
does not pass the parameter in a register and has to
do an indirect access.
I also wonder why the compiler does not elide the check for nullptr
before calling operator delete
since this is defined to be a noop anyway. I guess that unique_ptr
could be specialized for the default deleter to not perform the check in the destructor, but that would be a very small micro-optimization.
System V ABI uses Itanium C++ ABI and refers to it. In particular, C++ Itanium ABI specifies that
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
Specifically:
...
If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception), at the end of enclosing full-expression.
So a simple answer to question "why it is not passed into register" is "because it can't".
Now, an interesting question might be 'why did C++ Itanium ABI decided to go with that'.
While I wouldn't claim that I have intimate knowledge with rationale, two things come to mind:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With