Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unique_ptr vs class instance as member variable

There is a class SomeClass which holds some data and methods that operates on this data. And it must be created with some arguments like:

SomeClass(int some_val, float another_val);

There is another class, say Manager, which includes SomeClass, and heavily uses its methods.

So, what would be better in terms of performance (data locality, cache hits, etc.), declare object of SomeClass as member of Manager and use member initialization in Manager's constructor or declare object of SomeClass as unique_ptr?

class Manager
{    
public:    
    Manager() : some(5, 3.0f) {}

private:
    SomeClass some;    
};

or

class Manager
{
public:
    Manager();

private:
    std::unique_ptr<SomeClass> some;
}
like image 369
saintcrawler Avatar asked Apr 04 '15 13:04

saintcrawler


1 Answers

Short answer

Most likely, there is no difference in runtime efficiency of accessing your subobject. But using pointer can be slower for several reasons (see details below).

Moreover, there are several other things you should remember:

  1. When using pointer, you usually have to allocate/deallocate memory for subobject separately, which takes some time (quite a lot if you do it much).
  2. When using pointer, you can cheaply move your subobject without copying.

Speaking of compile times, pointer is better than plain member. With plain member, you cannot remove dependency of Manager declaration on SomeClass declaration. With pointers, you can do it with forward declaration. Less dependencies may result is less build times.

Details

I'd like to provide more details about performance of subobject accesses. I think that using pointer can be slower than using plain member for several reasons:

  1. Data locality (and cache performance) is likely to be better with plain member. You usually access data of Manager and SomeClass together, and plain member is guaranteed to be near other data, while heap allocations may place object and subobject far from each other.
  2. Using pointer means one more level of indirection. To get address of a plain member, you can simply add a compile-time constant offset fo object address (which is often merged with other assembly instruction). When using pointer, you have to additionally read a word from the member pointer to get actual pointer to subobject. See Q1 and Q2 for more details.
  3. Aliasing is perhaps the most important issue. If you are using plain member, then compiler can assume that: your subobject lies fully within your object in memory, and it does not overlap with other members of your object. When using pointer, compiler often cannot assume anything like this: you subobject may overlap with your object and its members. As a result, compiler has to generate more useless load/store operations, because it thinks that some values may change.

Here is an example for the last issue (full code is here):

struct IntValue {
    int x;
    IntValue(int x) : x(x) {}
};
class MyClass_Ptr {
    unique_ptr<IntValue> a, b, c;
public:
    void Compute() {
        a->x += b->x + c->x;
        b->x += a->x + c->x;
        c->x += a->x + b->x;
    }
};

Clearly, it is stupid to store subobjects a, b, c by pointers. I've measured time spent in one billion calls of Compute method for a single object. Here are results with different configurations:

2.3 sec:    plain member (MinGW 5.1.0)
2.0 sec:    plain member (MSVC 2013)
4.3 sec:    unique_ptr   (MinGW 5.1.0)
9.3 sec:    unique_ptr   (MSVC 2013)

When looking at the generated assembly for innermost loop in each case, it is easy to understand why the times are so different:

;;; plain member (GCC)
lea edx, [rcx+rax]   ; well-optimized code: only additions on registers
add r8d, edx         ; all 6 additions present (no CSE optimization)
lea edx, [r8+rax]    ; ('lea' instruction is also addition BTW)
add ecx, edx
lea edx, [r8+rcx]
add eax, edx
sub r9d, 1
jne .L3

;;; plain member (MSVC)
add ecx, r8d  ; well-optimized code: only additions on registers
add edx, ecx  ; 5 additions instead of 6 due to a common subexpression eliminated
add ecx, edx
add r8d, edx
add r8d, ecx
dec r9
jne SHORT $LL6@main

;;; unique_ptr (GCC)
add eax, DWORD PTR [rcx]   ; slow code: a lot of memory accesses
add eax, DWORD PTR [rdx]   ; each addition loads value from memory
mov DWORD PTR [rdx], eax   ; each sum is stored to memory
add eax, DWORD PTR [r8]    ; compiler is afraid that some values may be at same address
add eax, DWORD PTR [rcx]
mov DWORD PTR [rcx], eax
add eax, DWORD PTR [rdx]
add eax, DWORD PTR [r8]
sub r9d, 1
mov DWORD PTR [r8], eax
jne .L4

;;; unique_ptr (MSVC)
mov r9, QWORD PTR [rbx]       ; awful code: 15 loads, 3 stores
mov rcx, QWORD PTR [rbx+8]    ; compiler thinks that values may share 
mov rdx, QWORD PTR [rbx+16]   ;   same address with pointers to values!
mov r8d, DWORD PTR [rcx]
add r8d, DWORD PTR [rdx]
add DWORD PTR [r9], r8d
mov r8, QWORD PTR [rbx+8]
mov rcx, QWORD PTR [rbx]      ; load value of 'a' pointer from memory
mov rax, QWORD PTR [rbx+16]
mov edx, DWORD PTR [rcx]      ; load value of 'a->x' from memory
add edx, DWORD PTR [rax]      ; add the 'c->x' value
add DWORD PTR [r8], edx       ; add sum 'a->x + c->x' to 'b->x'
mov r9, QWORD PTR [rbx+16]
mov rax, QWORD PTR [rbx]      ; load value of 'a' pointer again =)
mov rdx, QWORD PTR [rbx+8]
mov r8d, DWORD PTR [rax]
add r8d, DWORD PTR [rdx]
add DWORD PTR [r9], r8d
dec rsi
jne SHORT $LL3@main
like image 178
stgatilov Avatar answered Oct 17 '22 23:10

stgatilov