I'm using std::deque
. I was sure that substituting a loop with a push_back
with a single insert
would yield an increase in performance. It is also recommended, for instance here.
But now I'm not so sure anymore.
I've run some benchmarks on test code.
Main.cpp:
#include"queueInsert.h"
#include<Windows.h>
std::deque<int> queue;
constexpr size_t len = 64;
int arr[len];
int main()
{
DWORD startTime = GetTickCount();
for (int i = 0; i < 100000; ++i)
{
insert(queue, arr, len);
}
DWORD endTime = GetTickCount();
return endTime - startTime;
}
queueInsert.h:
#include<deque>
void insert(std::deque<int>&, int* arr, int n);
queueInsert.cpp -push version
#include "queueInsert.h"
void insert(std::deque<int>& queue, int* arr, int n)
{
for (int i = 0; i < n; ++i)
{
queue.push_back(arr[i]);
}
}
queueInsert.cpp -insert version
#include "queueInsert.h"
void insert(std::deque<int>& queue, int* arr, int n)
{
queue.insert(queue.end(), arr, arr + n);
}
I get 203
milliseconds with push_back
, but 218
with insert
.
Changing len
to 6
, and increasing the iterations to one million, keeps the same result: 219
mills for push
and 266
for insert
.
Only with len = 640
does push
lose out, and even then by very little: 1531
for push
against 1437
for insert
.
I'm compiling in Release in VisualStudio 2015 under Windows 10.
I'm sure the compiler isn't doing optimizations as inlining the constant number of iterations or fusing the loops, as every time i change the implementation only queueInsert.cpp
is recompiled.
Am i doing profiling wrong? Or should I actually keep push_back
if the amount of elements to be inserted isn't likely to be big?
deque::insert
effectively has 3 possible ways of operating: general insert, insert in front, insert in back. Because of that, every time you call insert
, it must do a test to see which way it needs to insert. So it has to test the iterator you pass against the front and the back.
deque::push_back
has only 1 mode of operation: insert in back.
The advantage of using a bulk insert operation is that the container can detect exactly how much memory it needs to allocate to perform the whole insertion, since it can get the length of the iterator range. So the larger the bulk insert, the better insert
will be.
Well, the better for vector
at least.
See with vector
, if you insert 30,000 elements one at a time, you're likely to perform reallocation ~14-15 times. That means allocating new memory and copying the old data into that memory. Whereas if you insert 30,000 elements all at once, you get a single reallocation.
deque
is typically implemented as an array of fixed-size blocks. Because of that, if you insert 30,000 elements one at a time, you will get ~3,000 allocations (depending on the block size). If you insert 30,000 elements all at once, you will get... ~3,000 allocations. So you're not really saving much.
Since bulk insert isn't much different from single insert for deque
, what happens is a fight between micro-optimization issues. Every insert
call has to do iterator comparison to see how to perform that insertion. As such, the smaller the insertions, the less efficient insert
will be. push_back
doesn't have that overhead, but it is a function call for each element. So it has that overhead.
Therefore, insert
will likely win when the number of elements added per insert is high.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With