What is the fastest technology to send messages between C++ application processes, on Linux? I am vaguely aware that the following techniques are on the table: <ul> <li>TCP</li> <li>UDP</li> <li>Sockets</li> <li>Pipes</li> <li>Named pipes</li> <li>Memory-mapped files</li> </ul> are there any more ways and what is the fastest?

NetOS Systems Research Group from Cambridge University, UK has done some (open-source) IPC benchmarks. Source code is located at https://github.com/avsm/ipc-bench . Project page: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/ . Results: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/results.html This research has been published using the results above: http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf

Check CMA and kdbus: https://lwn.net/Articles/466304/ I think the fastest stuff these days are based on AIO. http://www.kegel.com/c10k.html

Fastest technique to pass messages between processes on Linux?

4 Answers

Whilst all the above answers are very good, I think we'd have to discuss what is "fastest" [and does it have to be "fastest" or just "fast enough for "?]

For LARGE messages, there is no doubt that shared memory is a very good technique, and very useful in many ways.

However, if the messages are small, there are drawbacks of having to come up with your own message-passing protocol and method of informing the other process that there is a message.

Pipes and named pipes are much easier to use in this case - they behave pretty much like a file, you just write data at the sending side, and read the data at the receiving side. If the sender writes something, the receiver side automatically wakes up. If the pipe is full, the sending side gets blocked. If there is no more data from the sender, the receiving side is automatically blocked. Which means that this can be implemented in fairly few lines of code with a pretty good guarantee that it will work at all times, every time.

Shared memory on the other hand relies on some other mechanism to inform the other thread that "you have a packet of data to process". Yes, it's very fast if you have LARGE packets of data to copy - but I would be surprised if there is a huge difference to a pipe, really. Main benefit would be that the other side doesn't have to copy the data out of the shared memory - but it also relies on there being enough memory to hold all "in flight" messages, or the sender having the ability to hold back things.

I'm not saying "don't use shared memory", I'm just saying that there is no such thing as "one solution that solves all problems 'best'".

To clarify: I would start by implementing a simple method using a pipe or named pipe [depending on which suits the purposes], and measure the performance of that. If a significant time is spent actually copying the data, then I would consider using other methods.

Of course, another consideration should be "are we ever going to use two separate machines [or two virtual machines on the same system] to solve this problem. In which case, a network solution is a better choice - even if it's not THE fastest, I've run a local TCP stack on my machines at work for benchmark purposes and got some 20-30Gbit/s (2-3GB/s) with sustained traffic. A raw memcpy within the same process gets around 50-100GBit/s (5-10GB/s) (unless the block size is REALLY tiny and fits in the L1 cache). I haven't measured a standard pipe, but I expect that's somewhere roughly in the middle of those two numbers. [This is numbers that are about right for a number of different medium-sized fairly modern PC's - obviously, on a ARM, MIPS or other embedded style controller, expect a lower number for all of these methods]

189

answered Oct 17 '22 21:10

Mats Petersson

I would suggest looking at this also: How to use shared memory with Linux in C.

Basically, I'd drop network protocols such as TCP and UDP when doing IPC on a single machine. These have packeting overhead and are bound to even more resources (e.g. ports, loopback interface).

answered Oct 17 '22 20:10

Sam

NetOS Systems Research Group from Cambridge University, UK has done some (open-source) IPC benchmarks.

Source code is located at https://github.com/avsm/ipc-bench .

Project page: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/ .

Results: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/results.html

This research has been published using the results above: http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf

answered Oct 17 '22 21:10

DejanLekic

Check CMA and kdbus: https://lwn.net/Articles/466304/

I think the fastest stuff these days are based on AIO. http://www.kegel.com/c10k.html

answered Oct 17 '22 20:10

Alex

Related questions
                            
                                In C++11, what is the point of a thread which "does not represent a thread of execution"?
                            
                                gdb: How do I pause during loop execution?
                            
                                Is fastcall really faster?
                            
                                How do you set GDB debug flag with cmake?
                            
                                Dynamic and static Scoping program differences
                            
                                Clearing terminal in Linux with C++ code
                            
                                How to force gcc to link an unused static library
                            
                                gdb in docker container returns "ptrace: Operation not permitted."
                            
                                Random element in a map
                            
                                Linker Error C++ "undefined reference " [duplicate]
                            
                                is that char null terminator is including in the length count
                            
                                What's the use of the private copy constructor in c++
                            
                                C++ ABI issues list
                            
                                Error: non-aggregate type 'vector<int>' cannot be initialized with an initializer list
                            
                                Can I redefine a C++ macro then define it back?
                            
                                accessing protected members of superclass in C++ with templates [duplicate]
                            
                                Extern functions in C vs C++
                            
                                Why does cudaMalloc() use pointer to pointer?
                            
                                How to Convert unsigned char* to std::string in C++?
                            
                                Trailing underscores for member variables in C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest technique to pass messages between processes on Linux?

Tags:

c++

performance

linux

latency

ipc

user997112

People also ask

4 Answers

Mats Petersson

Sam

DejanLekic

Alex

Recent Activity

Donate For Us