why is boost::circular_buffer so slow in my benchmark?

Question

Reading the rationale, boost::circular_buffer looked promising:

Suitability for real-time and performance critical applications.

Fast constant-time insertion and removal of elements from the front and back.

When I run a simple benchmark that simulates my use case, using it as a byte buffer:

write a bigger chunk
read smaller chunks until empty
repeat

the performance is absolutely abysmal, more than 4000x slower than my own hack and spsc_queue.

lin : 101  // 10240x
lock: 109  // 10240x
circ: 427  // 10x

Note that the loopcount for circular is 10 and the loopcount for the others is 10*1024. See working example here.

Am I using it completely wrong or was it just not designed with fundamental / POD types in mind?

Edit:

Adopting the benchmark with the provided changes does not fully resolve the issues on MSVC2015. There a factor of 100x remains.

lin : 69   // 10240x
lock: 79   // 10240x
circ: 9688 // 10240x

Inserting several items at once being so slow is problematic. Assigning would work in this special scenario because the buffer is depleted before the insert, but this is not a general solution. In resume spsc_queue wins on all fronts, its fast, can be used without depletion, and can be used in a mulithreaded environment (in a single producer single consumer scenario).

sehe · Accepted Answer

Firstly, make sure the benchmarks are sound. If you don't use the results of the computations, compilers will eliminate it as dead code when you least expect it.

You circular deletion looks suboptimal. Use this instead:

buffer.erase_begin(1024); // or indeed, use checked size see below

UPDATE

The second thing that was hurting performance - badly - was the insert call. In your use-case you can use assign which, like in the contenders, gets compiled down to a mempcy or memmove.
Make sure debugging is disabled (define NDEBUG and or BOOST_CB_DISABLE_DEBUG)

Here's my refactored benchmark using Nonius: http://paste.ubuntu.com/15222217/

clock resolution: mean is 18.6412 ns (40960002 iterations)

benchmarking linear
collecting 100 samples, 1 iterations each, in estimated 3.93727 s
mean: 39.0804 ms, lb 39.0567 ms, ub 39.1051 ms, ci 0.95
std dev: 124.19 μs, lb 111.153 μs, ub 141.079 μs, ci 0.95
found 0 outliers among 100 samples (0%)
variance is unaffected by outliers

benchmarking lockfree
collecting 100 samples, 1 iterations each, in estimated 4.78513 s
mean: 37.0188 ms, lb 37.0106 ms, ub 37.0277 ms, ci 0.95
std dev: 43.5788 μs, lb 37.3685 μs, ub 52.8458 μs, ci 0.95
found 3 outliers among 100 samples (3%)
variance is unaffected by outliers

benchmarking circular
collecting 100 samples, 1 iterations each, in estimated 9.78763 s
mean: 62.884 ms, lb 62.8657 ms, ub 62.9041 ms, ci 0.95
std dev: 98.0325 μs, lb 85.6543 μs, ub 119.395 μs, ci 0.95
found 1 outliers among 100 samples (1%)
variance is unaffected by outliers

Interactive results: http://stackoverflow-sehe.s3.amazonaws.com/57c2bfea-3e9d-4503-8d23-3b88209fc3ce/stats.html

enter image description here

Without nonius: Live On Coliru

Output

lin : 101 (checksum: -1741910392)
lock: 89 (checksum: -1741910392)
circ: 102 (checksum: -1741910392)

why is boost::circular_buffer so slow in my benchmark?

Tags:

c++

boost

Thomas

1 Answers

sehe

Recent Activity

Donate For Us

why is boost::circular_buffer so slow in my benchmark?

Tags:

c++

boost

Thomas

1 Answers

sehe

Related questions

Recent Activity

Donate For Us