Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tradeoffs when returning a collection

Tags:

c++

There are various ways of returning a collection of items from a method of a class in C++.

For example, consider the class MessageSpy that listens on all Messages sent over a connection. A client could access the messaging information in a number of ways.

  1. const CollectionClass MessageSpy::getMessages()
  2. Iterator MessageSpy::begin(), Iterator MessageSpy::end()
  3. void MessageSpy::getMessages(OutputIterator)
  4. void MessageSpy::eachMessage(Functor)
  5. others...

Each approach has its trade-offs. For example: Approach 1 would require copying the whole collection which is expensive for large collections. While approach 2 makes the class look like a collection which is inappropriate for a view...

Since I'm always strungling choosing the most appropriate approach in I wonder what you consider the trade-offs/costs when considering these approaches?

like image 481
fvanrysselberghe Avatar asked Aug 02 '13 19:08

fvanrysselberghe


2 Answers

I suggest an iterator based/callback based approach in cases where you demand the most lightweight solution possible.

The reason is that it decouples the supplier from the usage patterns by the consumer.

In particular, slamming the result into a collection1 (even though the result maybe "optimized" - likely into (N)RVO or moving instead of copying the object) would still allocate a complete container for the full capacity.

Edit: 1 an excellent addition to "obligatory papers" (they're not; they're just incredibly helpful if you want to understand things): Want Speed? Pass By value by Dave Abrahams.

Now

  • this is overkill if the consumer actually stops processing data after the first few elements

    for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
    {
         if (!doProcessing(*f))
             break;
    }
    
  • this can be suboptimal even if the consumer processes al elements eventually: there might not be a need to have all elements copied at any particular moment, so the 'slot' for the 'current element' can be reused, reducing memory requirements, increasing cache locality. E.g.:

    for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
    {
         myElementType const& slot = *f; // making the temp explicit
         doProcessing(slot);
    }
    

Note that iterator interfaces are simply still superior if the consumer did want a collection containing all elements:

std::vector<myElementType> v(myType.begin(), myType.end());

// look: the client gets to _decide_ what container he wants!
std::set<myElementType, myComparer> s(myType.begin(), myType.end());

Try getting this flexibility otherwise.

Finally, there are some elements of style:

  • by nature it's easy to expose (const) references to the elements using iterators; this makes it much easier to avoid object slicing and to enable clients to use the elements polymorphically.

  • iterator-style interfaces could be overloaded to return non-const references on dereference. A container to be returned, couldn't contain references (directly)

  • if you adhere to the requirements of range-based-for in C++11 you can have some syntactic sugar:

    for (auto& slot : myType)
    {
         doProcessing(slot);
    }
    

Finally, (as shown above), in the general sense iterators work nicely with the standard library.


The callback style (and similarly the Output-iterator style) has many of the benefits of the iterator style (namely, you could use return values to abort iteration halfway, and you could do processing without allocating copies of all elements up front), but it seems to me to be slightly less flexible in use. Of course, there may be situations where you want to encourage a particular usage pattern, and this migh be a good way to go.

like image 91
sehe Avatar answered Sep 28 '22 14:09

sehe


The first thing (you somehow didn't mention at all) I would think about is

const CollectionClass& MessageSpy::getMessages()

Note the &. That returns you const reference which can't be modified but can be freely accepted.

No copying, unless you really want to copy.

If that's not suitable, Qt, for example, uses "implicit data sharing" for plenty of classes. I.e. your classes are "kinda" returned by value, BUT their internal data is shared until you attempt to perform write operation on one of them. In this case, class you're trying to write into, performs a deep copy, and data stops being shared. That means less data is moved around.

And there's return value optimization some people on SO seems to love too much. Basically, when you return something big by value, some compilers in certain situations can eliminate extra copy, and immediately pass value bypassing extra assignment which may be faster than returning by reference. I wouldn't rely on it too much, but if you profiled your code and figured out that using RVO provides a good speed-up, then it is worth using.

I wouldn't recommend "iterators", because using them on C++03 compiler without auto keyword is royal pain in the #&@. Long names or many typedefs. I would return const reference to container itself instead.

like image 20
SigTerm Avatar answered Sep 28 '22 14:09

SigTerm