Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reuse object vs. creating new object

Tags:

c++

One of our projects deals with tons of data. It selects data from an database and serializes the results into JSON/XML.

Sometimes the amount of selected rows can reach the 50 million mark easily.

However though, the runtime of the program was to bad in the beginning.

So we have refactored the program with one major adjustment:

The working objects for serialization wouldn't be recreated for every single row, instead the object will be cleared and reinitialized.

For example:

Before:

For every single database row we create an object of DatabaseRowSerializer and call the specific serialize function.

// Loop with all dbRows
{
    DatabaseRowSerializer serializer(dbRow);
    result.add(serializer.toXml());
}

After:

The constructor of DatabaseRowSerializer doesn't sets the dbRow. Instead this will be done by the initDbRow()-function.

The main thing here is, that only one object will be used for the whole runtime. After the serialization of an dbRow, the clear()-function will be called to reset the object.

DatabaseRowSerializer serializer;

// Loop with all dbRows
{
    serializier.initDbRow(dbRow);
    result.add(serializer.toXml());
    serializier.clear();
}

So my question:

Is this really a good way to handle the problem? In my opinion init()-functions aren't really smart. And normally a constructor should be used to initialize the possible parameters.

Which way do you generally prefer? Before or after?

like image 962
user2622344 Avatar asked Oct 17 '18 09:10

user2622344


1 Answers

On the one hand, this is subjective. On the other, opinion widely agrees that in C++ you should avoid this "init function" idiom because:

  1. It is worse code

    • You have to remember to "initialise" your object and, if you don't, what state is it in? Your object should never be in a "dead" state. (Don't get me started on "moved-from" objects…) This is why C++ introduced constructors and destructors, because the old C approach was kind of minging and resulting programs are harder to prove correct.
  2. It is unnecessary

    • There is essentially no overhead in creating a DatabaseRowSerializer every time, unless its constructor does more than your initDbRow function, in which case your two examples are not equivalent anyway.

      Even if your compiler doesn't optimise away the unnecessary "allocation", there isn't really an allocation anyway because the object just takes up space on the stack and it has to do that regardless.

      So if this change really solved your performance problem, something else was probably going on.

Use your constructors and destructors. Freely and proudly!

That's the common advice when writing C++.


A possible third approach if you did want to make the serializer re-usable for whatever reason, is to move all of its state into the actual operational function call:

DatabaseRowSerializer serializer;

// loop with all dbRows
{
    result.add(serializer.toXml(dbRow));
}

You might do this if the serialiser has some desire to cache information, or re-use dynamically-allocated buffers, to aid in performance. That of course adds some state into the serialiser.

If you do this and still don't have any state, then the whole thing can just be a static call:

// loop with all dbRows
{
    result.add(DatabaseRowSerializer::toXml(dbRow));
}

…but then it may as well just be a function.

Ultimately we can't know exactly what's best for you, but there are plenty of options and considerations.

like image 143
Lightness Races in Orbit Avatar answered Sep 28 '22 07:09

Lightness Races in Orbit