Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use manual alloc to allow move semantics?

Tags:

c++

c++11

I'm interested to learn when I should start considering using move semantics in favour over copying data depending on the size of that data and the usage of the class. For example for a Matrix4 class we have two options:

struct Matrix4{
    float* data;

    Matrix4(){ data = new float[16]; }
    Matrix4(Matrix4&& other){
        *this = std::move(other);
    }
    Matrix4& operator=(Matrix4&& other)
    {
       ... removed for brevity ...
    }
    ~Matrix4(){ delete [] data; }

    ... other operators and class methods ...
};

struct Matrix4{
    float data[16]; // let the compiler do the magic

    Matrix4(){}
    Matrix4(const Matrix4& other){
        std::copy(other.data, other.data+16, data);
    }
    Matrix4& operator=(const Matrix4& other)
    {
       std::copy(other.data, other.data+16, data);
    }

    ... other operators and class methods ...
};

I believe there is some overhead having to alloc and dealloc memory "by hand", and given the chances of really hitting the move construct when using this class what is the preferred implementations of a class with such small in memory size? Is really always preferred move over copy?

like image 851
Trax Avatar asked Apr 21 '13 08:04

Trax


People also ask

What's the need to move semantics?

Move semantics allows you to avoid unnecessary copies when working with temporary objects that are about to evaporate, and whose resources can safely be taken from that temporary object and used by another.

Does rust have move semantics?

Rust requires implementations for clone, but for all moves, the implementation is the same: copy the memory in the value itself, and don't call the destructor on the original value. And in Rust, all types are movable with this exact implementation – non-movable types don't exist (though non-movable values do).

What is copy semantics C++?

From a purely object-oriented perspective, "copy semantics" is the right way to preserve control over object ownership. Once an assignment is made, both source and destination can be altered without impacting each other.


2 Answers

In the first case, allocation and deallocation are expensive - because you are dynamically allocating memory from the heap, even if your matrix is constructed on the stack - and moves are cheap (just copying a pointer).

In the second case, allocation and deallocation are cheap, but moves are expensive - because they are actually copies.

So if you are writing an application and you just care about performance of that application, the answer to the question "Which one is better?" likely depends on how much you are creating/destroying matrices vs how much you are copying/moving them - and in any case, do your own measurements to support any conjectures.

By doing measurements you will also check whether your compiler is doing a lot of copy/move elisions in places where you expect moves to be going on - results may be against your expectations.

Also, cache locality may have an impact here: if you allocate storage for a matrix's data on the heap, having three matrices that you want to process element-by-element created on the stack will likely require quite a scattered memory access pattern - potentially result in more cache misses.

On the other hand, if you are using arrays for which memory is allocated on the stack, it is likely that the same cache line will be able to hold the data of all those matrices - thus increasing the cache hit rate. Not to mention the fact that in order to access elements on the heap you first need to read the value of the data pointer, which means accessing a different region of memory than the one holding the elements.

So once more, the moral of the story is: do your own measurements.

If you are writing a library on the other hand, and you cannot predict how many constructions/destructions vs moves/copies the client is going to perform, then you may offer two such matrix classes, and factor out the common behavior into a base class - possibly a base class template.

That will give the client flexibility and will give you a sufficiently high degree of reuse - no need to write the implementation of all common member functions twice.

This way, clients may choose the matrix class that best fits the creation/moving profile of the application in which they are using it.


UPDATE:

As DeadMG points out in the comments, one advantage of the array-based approach over the dynamic allocation approach is that the latter is doing manual resource management through raw pointers, new, and delete, which forces you to write user-defined destructor, copy constructor, move constructor, copy-assignment operator, and move-assignment operator.

You could avoid all of this if you were using std::vector, which would perform the memory management task for you and would save you from the burden of defining all those special member functions.

This said, the mere fact of suggesting to use std::vector instead of doing manual memory management - as much as it is a good advice in terms of design and programming practice - does not answer the question, while I believe the original answer does.

like image 130
Andy Prowl Avatar answered Oct 26 '22 10:10

Andy Prowl


Like everything else in programming, specially when performance is concerned, it's a complicated trade-off.

Here, you have two designs: to keep the data inside your class (method 1) or to allocate the data on the heap and keep a pointer to it in the class (method 2).

As far as I can tell, these are the trade-offs you are making:

  1. Construction/Destruction Speed: Naively implemented, method 2 will be slower here, because it requires dynamic memory allocation and deallocation. However, you can help the situation using custom memory allocators, specially if the size of your data is predictable and/or fixed.

  2. Size: In your 4x4 matrix example, method 2 requires storing an additional pointer, plus memory allocation size overhead (typically can be anywhere from 4 to 32 bytes.) This might or might not be a factor, but it certainly must be considered, specially if your class instances are small.

  3. Move Speed: Method 2 has very fast move operation, because it only requires setting two pointers. In method 1, you have no choice but to copy your data. However, while being able to rely on fast moving can make your code pretty and straightforward and readable and more efficient, compilers are quite good at copy elision, which means that you can write your pretty, straightforward and readable pass-by-value interfaces even if you implement method 1 and the compiler will not generate too many copies anyway. But you can't be sure of that, so relying on this compiler optimization, specially if your instances are larger, requires measurement and inspection of the generated code.

  4. Member Access Speed: This is the most important differentiator for small classes, in my opinion. Each time you access an element in a matrix implemented using method 2 (or access a field in a class implemented that way, i.e., with external data) you access the memory twice: once to read the address of the external block of memory, and once to actually read the data you want. In method 1, you just directly access the field or element you want. This means that in method 2, every access could potentially generate an additional cache miss, which could affect your performance. This is specially important if your class instances are small (e.g. a 4x4 matrix) and you operate on many of them stored in arrays or vectors.

    In fact, this is why you might want to actually copy bytes around when you are copying/moving an instance of your matrix, instead of just setting a pointer: to keep your data contiguous. This is why flat data structures (like arrays of values,) are much preferred in high-performance code, than pointer spaghetti data structures (like arrays of pointers, linked lists, etc.) So, while moving is cooler and faster than copying in isolation, you sometimes want to do copy your instances to make (or keep) a whole bunch of them contiguous and make iteration over and accessing them much much more efficient.

  5. Flexibility of Length/Size: Method 2 is obviously more flexible in this regard because you can decide how much data you need at runtime, be it 16 or 16777216 bytes.

All in all, here's the algorithm I suggest you use for picking one implementation:

  • If you need variable amount of data, pick method 2.
  • If you have very large amounts of data in each instance of your class (e.g. several kilobytes,) pick method 2.
  • If you need to copy instances of your class around a lot (and I mean a lot!) pick method 2 (but try to measure the performance improvement and inspect the generated code, specially in hot areas.)
  • In all other cases, prefer method 1.

In short, method 1 should be your default, until proven otherwise. And the way to prove anything regarding performance is measurement! So don't optimize anything unless you have measured and have proof that one method is better than another, and also (as mentioned in other answers,) you might want to implement both methods if you are writing a library and let your users choose the implementation.

like image 37
yzt Avatar answered Oct 26 '22 10:10

yzt