One of our projects deals with tons of data. It selects data from an database and serializes the results into JSON/XML.
Sometimes the amount of selected rows can reach the 50 million mark easily.
However though, the runtime of the program was to bad in the beginning.
So we have refactored the program with one major adjustment:
The working objects for serialization wouldn't be recreated for every single row, instead the object will be cleared and reinitialized.
For example:
Before:
For every single database row we create an object of DatabaseRowSerializer and call the specific serialize function.
// Loop with all dbRows
{
DatabaseRowSerializer serializer(dbRow);
result.add(serializer.toXml());
}
After:
The constructor of DatabaseRowSerializer doesn't sets the dbRow. Instead this will be done by the initDbRow()-function.
The main thing here is, that only one object will be used for the whole runtime. After the serialization of an dbRow, the clear()-function will be called to reset the object.
DatabaseRowSerializer serializer;
// Loop with all dbRows
{
serializier.initDbRow(dbRow);
result.add(serializer.toXml());
serializier.clear();
}
So my question:
Is this really a good way to handle the problem? In my opinion init()-functions aren't really smart. And normally a constructor should be used to initialize the possible parameters.
Which way do you generally prefer? Before or after?
On the one hand, this is subjective. On the other, opinion widely agrees that in C++ you should avoid this "init function" idiom because:
It is worse code
It is unnecessary
There is essentially no overhead in creating a DatabaseRowSerializer
every time, unless its constructor does more than your initDbRow
function, in which case your two examples are not equivalent anyway.
Even if your compiler doesn't optimise away the unnecessary "allocation", there isn't really an allocation anyway because the object just takes up space on the stack and it has to do that regardless.
So if this change really solved your performance problem, something else was probably going on.
Use your constructors and destructors. Freely and proudly!
That's the common advice when writing C++.
A possible third approach if you did want to make the serializer re-usable for whatever reason, is to move all of its state into the actual operational function call:
DatabaseRowSerializer serializer;
// loop with all dbRows
{
result.add(serializer.toXml(dbRow));
}
You might do this if the serialiser has some desire to cache information, or re-use dynamically-allocated buffers, to aid in performance. That of course adds some state into the serialiser.
If you do this and still don't have any state, then the whole thing can just be a static call:
// loop with all dbRows
{
result.add(DatabaseRowSerializer::toXml(dbRow));
}
…but then it may as well just be a function.
Ultimately we can't know exactly what's best for you, but there are plenty of options and considerations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With