Reading the rationale, boost::circular_buffer looked promising:
Suitability for real-time and performance critical applications.
Fast constant-time insertion and removal of elements from the front and back.
When I run a simple benchmark that simulates my use case, using it as a byte buffer:
the performance is absolutely abysmal, more than 4000x slower than my own hack and spsc_queue.
lin : 101 // 10240x
lock: 109 // 10240x
circ: 427 // 10x
Note that the loopcount for circular
is 10
and the loopcount for the others is 10*1024
. See working example here.
Am I using it completely wrong or was it just not designed with fundamental / POD types in mind?
Edit:
Adopting the benchmark with the provided changes does not fully resolve the issues on MSVC2015. There a factor of 100x remains.
lin : 69 // 10240x
lock: 79 // 10240x
circ: 9688 // 10240x
Inserting several items at once being so slow is problematic. Assigning would work in this special scenario because the buffer is depleted before the insert, but this is not a general solution. In resume spsc_queue wins on all fronts, its fast, can be used without depletion, and can be used in a mulithreaded environment (in a single producer single consumer scenario).
Firstly, make sure the benchmarks are sound. If you don't use the results of the computations, compilers will eliminate it as dead code when you least expect it.
You circular deletion looks suboptimal. Use this instead:
buffer.erase_begin(1024); // or indeed, use checked size see below
UPDATE
The second thing that was hurting performance - badly - was the insert
call. In your use-case you can use assign
which, like in the contenders, gets compiled down to a mempcy or memmove.
Make sure debugging is disabled (define NDEBUG
and or BOOST_CB_DISABLE_DEBUG
)
Here's my refactored benchmark using Nonius: http://paste.ubuntu.com/15222217/
clock resolution: mean is 18.6412 ns (40960002 iterations)
benchmarking linear
collecting 100 samples, 1 iterations each, in estimated 3.93727 s
mean: 39.0804 ms, lb 39.0567 ms, ub 39.1051 ms, ci 0.95
std dev: 124.19 μs, lb 111.153 μs, ub 141.079 μs, ci 0.95
found 0 outliers among 100 samples (0%)
variance is unaffected by outliers
benchmarking lockfree
collecting 100 samples, 1 iterations each, in estimated 4.78513 s
mean: 37.0188 ms, lb 37.0106 ms, ub 37.0277 ms, ci 0.95
std dev: 43.5788 μs, lb 37.3685 μs, ub 52.8458 μs, ci 0.95
found 3 outliers among 100 samples (3%)
variance is unaffected by outliers
benchmarking circular
collecting 100 samples, 1 iterations each, in estimated 9.78763 s
mean: 62.884 ms, lb 62.8657 ms, ub 62.9041 ms, ci 0.95
std dev: 98.0325 μs, lb 85.6543 μs, ub 119.395 μs, ci 0.95
found 1 outliers among 100 samples (1%)
variance is unaffected by outliers
Interactive results: http://stackoverflow-sehe.s3.amazonaws.com/57c2bfea-3e9d-4503-8d23-3b88209fc3ce/stats.html
Without nonius: Live On Coliru
Output
lin : 101 (checksum: -1741910392)
lock: 89 (checksum: -1741910392)
circ: 102 (checksum: -1741910392)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With