Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fastest (low latency) method for Inter Process Communication between Java and C/C++

I have a Java app, connecting through TCP socket to a "server" developed in C/C++.

both app & server are running on the same machine, a Solaris box (but we're considering migrating to Linux eventually). type of data exchanged is simple messages (login, login ACK, then client asks for something, server replies). each message is around 300 bytes long.

Currently we're using Sockets, and all is OK, however I'm looking for a faster way to exchange data (lower latency), using IPC methods.

I've been researching the net and came up with references to the following technologies:

  • shared memory
  • pipes
  • queues
  • as well as what's referred as DMA (Direct Memory Access)

but I couldn't find proper analysis of their respective performances, neither how to implement them in both JAVA and C/C++ (so that they can talk to each other), except maybe pipes that I could imagine how to do.

can anyone comment about performances & feasibility of each method in this context ? any pointer / link to useful implementation information ?


EDIT / UPDATE

following the comment & answers I got here, I found info about Unix Domain Sockets, which seem to be built just over pipes, and would save me the whole TCP stack. it's platform specific, so I plan on testing it with JNI or either juds or junixsocket.

next possible steps would be direct implementation of pipes, then shared memory, although I've been warned of the extra level of complexity...


thanks for your help

like image 719
Bastien Avatar asked Apr 14 '10 06:04

Bastien


People also ask

What is the lowest latency method for interprocess communication?

Our experiments re- veal that shared memory provides the lowest latency and highest throughput, followed by kernel pipes and lastly, TCP/IP sockets. However, the latency trends provide interesting in- sights into the construction of each mechanism.

Which one is fastest form of Inter Process Communication?

Shared memory is the fastest form of interprocess communication. The main advantage of shared memory is that the copying of message data is eliminated.


1 Answers

Just tested latency from Java on my Corei5 2.8GHz, only single byte send/received, 2 Java processes just spawned, without assigning specific CPU cores with taskset:

TCP         - 25 microseconds Named pipes - 15 microseconds 

Now explicitly specifying core masks, like taskset 1 java Srv or taskset 2 java Cli:

TCP, same cores:                      30 microseconds TCP, explicit different cores:        22 microseconds Named pipes, same core:               4-5 microseconds !!!! Named pipes, taskset different cores: 7-8 microseconds !!!! 

so

TCP overhead is visible scheduling overhead (or core caches?) is also the culprit 

At the same time Thread.sleep(0) (which as strace shows causes a single sched_yield() Linux kernel call to be executed) takes 0.3 microsecond - so named pipes scheduled to single core still have much overhead

Some shared memory measurement: September 14, 2009 – Solace Systems announced today that its Unified Messaging Platform API can achieve an average latency of less than 700 nanoseconds using a shared memory transport. http://solacesystems.com/news/fastest-ipc-messaging/

P.S. - tried shared memory next day in the form of memory mapped files, if busy waiting is acceptable, we can reduce latency to 0.3 microsecond for passing a single byte with code like this:

MappedByteBuffer mem =   new RandomAccessFile("/tmp/mapped.txt", "rw").getChannel()   .map(FileChannel.MapMode.READ_WRITE, 0, 1);  while(true){   while(mem.get(0)!=5) Thread.sleep(0); // waiting for client request   mem.put(0, (byte)10); // sending the reply } 

Notes: Thread.sleep(0) is needed so 2 processes can see each other's changes (I don't know of another way yet). If 2 processes forced to same core with taskset, the latency becomes 1.5 microseconds - that's a context switch delay

P.P.S - and 0.3 microsecond is a good number! The following code takes exactly 0.1 microsecond, while doing a primitive string concatenation only:

int j=123456789; String ret = "my-record-key-" + j  + "-in-db"; 

P.P.P.S - hope this is not too much off-topic, but finally I tried replacing Thread.sleep(0) with incrementing a static volatile int variable (JVM happens to flush CPU caches when doing so) and obtained - record! - 72 nanoseconds latency java-to-java process communication!

When forced to same CPU Core, however, volatile-incrementing JVMs never yield control to each other, thus producing exactly 10 millisecond latency - Linux time quantum seems to be 5ms... So this should be used only if there is a spare core - otherwise sleep(0) is safer.

like image 155
Andriy Avatar answered Sep 30 '22 23:09

Andriy