I'm working on my first C++ project, which is a CSV parser (full source code here). It's at the point where it's working, and now I want to do basic refactoring / improve performance.
Currently the way the parser works is by returning each row as a std::vector<std::string>
, and I figured that instead of allocating a new vector and a new string every time I'd just have an internal vector and internal string with reserved memory that I'd clear again and again.
That worked, and I started looking at other places where I might be doing memory allocation, and I saw this function which copies the internal vector, and then clears it:
auto add_row() -> std::vector<std::string> {
auto row(m_bufvec);
m_bufvec.clear();
return row;
}
I figured that if I instead changed this line
auto row(m_bufvec);
to
auto row(std::move(m_bufvec));
It'd result in some sort of speed boost because according to http://en.cppreference.com/w/cpp/container/vector/vector it would take constant time instead of linear. To my surprise, it made the parser significantly slower (according to my really rough benchmark of running time ./main.o
over this file).
I'm completely new to optimization, benchmarking, and everything else that comes with tuning C++ code. Perhaps this optimization is useless even if it worked, but regardless, I'm curious as to why std::move
causes a slowdown. Am I missing something?
It's faster because moving allows the source to be left in a invalid state, so you can steal it's resources. For example, if a object holds a pointer to a large block of allocated memory, a move can simply steal the pointer while a copy must allocate its own memory and copy the whole memory block.
Std::move is not faster than straight up copying.
Move constructor moves the resources in the heap, i.e., unlike copy constructors which copy the data of the existing object and assigning it to the new object move constructor just makes the pointer of the declared object to point to the data of temporary object and nulls out the pointer of the temporary objects.
The move constructor and move assignment operator are simple. Instead of deep copying the source object (a) into the implicit object, we simply move (steal) the source object's resources. This involves shallow copying the source pointer into the implicit object, then setting the source pointer to null.
When you copy bufvec, its capacity is unchanged, but when you move it, its capacity is cleared. Thus, later when you fill bufvec, a logarithmic number of allocations are done to expand its capacity again, and such allocations can easily be your performance bottleneck.
The move version makes that function faster. But it makes other code slower. Micro optimizations do not reliably make programs faster.
Edit by OP:
The solution proposed by Cheers and hth. - Alf
in the comments of m_bufvec.reserve(row.size())
after the move fixes the problem, and confirms that the above reasoning was correct. Moreover it is more efficient, (albeit only slightly) because
you avoid copying the items [in bufvec]. If the items are simple integer values, that doesn't matter so much. If the items are e.g. strings, with dynamic allocation, then it really does matter.
Indeed the first version is expected to be faster. The reason is:
auto row(m_bufvec);
invokes the copy constuctor, which allocates the necessary memory for row
just at once. bufvec
also keeps its allocated memory. As a result, allocations per-element are minimized, and this is important because they involve an amount of relocations.
In the second version, auto row(std::move(m_bufvec));
bufvec
's memory becomes owned by row
, this operation is faster than the copy constructor. But as bufvec
has lost its allocated memory, when you later fill it element by element, it will do many re-allocations and (expensive) relocation. The number of re-allocations is usually logarithmic with the final size of the vector.
EDIT
The above explains the "unexpected" results in the main question. Finally, it turns out that the "ideal" for this operation is to move then reserve immediately:
auto row(std::move(m_bufvec);
m_bufvec.reserve(row.size());
return row;
This achieves the three goals:
no element-by-element allocation
no useless initialization for bufvec
no useless copying of elements from m_bufvec
into row
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With