Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use pointers or move semantics for passing big chunks of data?

I have a questions about recommended coding technique. I have a tool for model analysis and I sometimes need to pass a big amount of data (From a factory class to one that holds multiple heterogeneous chunks).

My question is whether there is some consensus about if I should rather use pointers or move the ownership (I need to avoid copying when possible as the size of a data-block may be as big as 1 GB).

The pointer version would look like this:

class FactoryClass {
...
public:
   static Data * createData() {
      Data * data = new Data;
      ...
      return data;
   }
};

class StorageClass {
   unique_ptr<Data> data_ptr;
...
public:
   void setData(Data * _data_ptr) {
      data_ptr.reset(_data_ptr);
   }
};

void pass() {
   Data * data = FactoryClass::createData();
   ...
   StorageClass storage;
   storage.setData(data);
}

Whereas the move version is like this:

class FactoryClass {
...
public:
   static Data createData() {
      Data data;
      ...
      return data;
   }
};

class StorageClass {
   Data data;
...
public:
   void setData(Data _data) {
      data = move(_data);
   }
};

void pass() {
   Data data = FactoryClass::createData();
   ...
   StorageClass storage;
   storage.setData(move(data));
}

I like the move version better - yes, I need to add move commands to the main code, but then I in the end have just the objects in the storage and I do not have to care about pointer semantics anymore.

However I am not quite relaxed when using the move semantics whom I do not understand in detail. (I do not care about the C++11 requirement though, as the code is already only Gcc4.7+ compilable).

Would someone have a reference that would support either version? Or is there some other, preferred version of how to pass data?

I was not able to Google anything as the keywords usually led to other topics.

Thanks.

EDIT NOTE: The second example got refactored to incorporate suggestions from the comments, the semantics remained unchanged.

like image 621
Adam Streck Avatar asked Jul 29 '13 16:07

Adam Streck


People also ask

When should I use move semantics?

That's what rvalue references and move semantics are for! Move semantics allows you to avoid unnecessary copies when working with temporary objects that are about to evaporate, and whose resources can safely be taken from that temporary object and used by another.

Why do we need pointers?

Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.

What is move semantic?

Move semantics is a set of semantic rules and tools of the C++ language. It was designed to move objects, whose lifetime expires, instead of copying them. The data is transferred from one object to another. In most cases, the data transfer does not move this data physically in memory.

What is difference between Pointer and reference in C++?

Pointers: A pointer is a variable that holds the memory address of another variable. A pointer needs to be dereferenced with the * operator to access the memory location it points to. References: A reference variable is an alias, that is, another name for an already existing variable.


1 Answers

When you are passing an object to a function, what you pass depends in part on how that function is going to use it. A function can use an object in one of three general ways:

  1. It can simply reference the object for the duration of the function call, with the calling function (or it's eventual parent up the call stack) maintaining ownership of the object. The reference in this case may be a constant reference or a modifiable reference. The function will not store this object long-term.

  2. It can copy the object directly. It doesn't gain ownership of the original, but it does acquire a copy of the original, so as to store, modify, or do with the copy what it will. Note that the difference between #1 and this is that the copy is made explicit in the parameter list. For example, taking a std::string by value. But this could also be as simple as taking an int by value.

  3. It can gain some form of ownership of the object. The function then has some responsibility over the object's destruction. This also allows the function to store the object long-term.

My general recommendation for the parameter types for these paradigms are as follows:

  1. Take the object by an explicit language reference where possible. If that's not possible, try a std::reference_wrapper. If that can't work, and no other solutions seem reasonable, then use a pointer. A pointer would be for things like optional parameters (though C++14's std::optional will make that less useful. Pointers will still have uses though), language arrays (though again, we have objects that cover most of the uses of these), and so forth.

  2. Take the object by value. That one's pretty non-negotiable.

  3. Take the object either by value-move (ie: move it into a by-value parameter) or by a smart-pointer to the object (which will also be taken by value, since you're going to copy/move it anyway). The problem with your code is that you're transferring ownership via a pointer, but with a raw pointer. Raw pointers have no ownership semantics. The moment you allocate any pointer, you should immediately wrap it in some kind of smart pointer. So your factory function should have returned a unique_ptr.

Your case appears to be #3. Which you use between value-move and smart pointer is entirely up to you. If you have to heap allocate Data for some reason, then the choice is pretty much made for you. If Data can be stack allocated, then you have some options.

I would generally do this based on an estimation of Data's internal size. If internally, it's just a few pointers/integers (and by "few", I mean like 3-4), then putting it on the stack is fine.

Indeed, it can better because you'll have less chance of a double-cache-miss. If your Data functions often just access data from another pointer, if you store Data by pointer, then every function call on it will have to dereference your stored pointer to fetch the internal one, then dereference the internal one. That's two potential cache misses, since neither pointer has any locality with StorageClass.

If you store Data by value, it's much more likely that Data's internal pointer will already be in the cache. It has better locality with StorageClass's other members; if you accessed some of StorageClass before now, you already paid for a cache miss, so you are likely to already have Data in the cache.

But movement is not free. It's cheaper than a full copy, but it's not free. You're still copying the internal data (and possibly nulling out any pointers on the original). But then again, allocating memory on the heap isn't free either. Nor is deallocating it.

But then again, if you're not moving it around very often (you move it around to get it to its final location, but little more after that), even moving a larger object would be fine. If you're using it more than you're moving it, then the cache locality of the object's storage will probably win out over the cost of moving.

There ultimately aren't a lot of technical reasons to pick one or the other. I would say to default to movement where reasonable.

like image 103
Nicol Bolas Avatar answered Sep 27 '22 17:09

Nicol Bolas