Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple Producer Multiple Consumer Lockfree Non Blocking Ring Buffer With Variable Length Write

I want to pass variable-length messages from multiple producers to multiple consumers, with low latency queue on multi-socket Xeon E5 systems. (400 bytes with a latency of 300 ns would be nice, for example.)

I've looked for existing implementations of lockless multiple-producer-multiple consumer (MPMC) queues using a non-blocking ring-buffer. But most implementations/algorithms online are node based (i.e. node is fixed length) such as boost::lockfree::queue, midishare, etc.

Of course, one can argue that the node type can be set to uint8_t or alike, but then the write will be clumsy and the performance will be horrible.

I'd also like the algorithm to offer overwrite detection on the readers' side that the readers will detect data being overwritten.

How can I implement a queue (or something else) that does this?

like image 622
HCSF Avatar asked Feb 13 '26 08:02

HCSF


1 Answers

Sorry for a but late answer, but have a look at DPDK's Ring library. It is free (BSD license), blazingly fast (doubt you will find a faster solution for free) and supports all major architectures. There are lot's of examples as well.

to pass variable-length messages

The solution is to pass a pointer to a message, not a whole message. DPDK also offers memory pools library to allocate/deallocate buffers between multiple threads or processes. The memory pool is also fast, lock-free and supports many architectures.

So overall solution would be:

  1. Create mempool(s) to share buffers among threads/processes. Each mempool supports just a fixed size buffer, so you might want to create few mempools to match your needs.

  2. Create one MPMC ring or a set of SPSC ring pairs between your threads/processes. The SPSC solution might be faster, but it might not fit your design.

  3. Producer allocates a buffer, fills it and passes a pointer to that buffer via the ring.

  4. Consumer receives the pointer, reads the message and deallocates the buffer.

Sounds like a lot of work, but there are lots of optimizations inside DPDK mempools and rings. But will it fit 300ns?

Have a look at the official DPDK performance reports. While there is no official report for ring performance, there is a vhost/vistio test results. Basically, packets travel like this:

Traffic gen. -- Host -- Virtual Machine -- Host -- Traffic gen.

Host runs as one process, virtual machine as another.

The test result is ~4M packets per second for 512 byte packets. It does not fit your budget, but you need to do much, much less work...

like image 51
Andriy Berestovskyy Avatar answered Feb 15 '26 20:02

Andriy Berestovskyy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!