Java NIO selector minimum possible latency

Tags:

I am doing some benchmarks with an optimized Java NIO selector on Linux over loopback (127.0.0.1).

My test is very simple:

One program sends an UDP packet to another program that echoes it back to the sender and the round trip time is computed. The next packet is only sent when the previous one is acked (when it returns). A proper warm up with a couple of millions messages is conducted before the benchmark is performed. The message has 13-bytes (not counting UDP headers).

For the round trip time I get the following results:

Min time: 13 micros
Avg time: 19 micros
75% percentile: 18,567 nanos
90% percentile: 18,789 nanos
99% percentile: 19,184 nanos
99.9% percentile: 19,264 nanos
99.99% percentile: 19,310 nanos
99.999% percentile: 19,322 nanos

But the catch here is that I am spinning 1 million messages.

If I spin only 10 messages I get very different results:

Min time: 41 micros
Avg time: 160 micros
75% percentile: 150,701 nanos
90% percentile: 155,274 nanos
99% percentile: 159,995 nanos
99.9% percentile: 159,995 nanos
99.99% percentile: 159,995 nanos
99.999% percentile: 159,995 nanos

Correct me if I am wrong, but I suspect that once we get the NIO selector spinning the response times become optimum. However if we are sending messages with a large enough interval between them, we pay the price of waking up the selector.

If I play around with sending just a single message I get various times between 150 and 250 micros.

So my questions for the community are:

1 - Is my minimum time of 13 micros with average of 19 micros optimum for this round trip packet test. It looks like I am beating ZeroMQ by far so I may be missing something here. From this benchmark it looks like ZeroMQ has a 49 micros avg time (99% percentile) on a standard kernel => http://www.zeromq.org/results:rt-tests-v031

2 - Is there anything I can do to improve the selector reaction time when I spin a single or very few messages? 150 micros does not look good. Or should I assume that on a prod environment the selector will not be quite?

By doing busy spinning around selectNow() I am able to get better results. Sending few packets is still worse than sending many packets, but I think I am now hitting the selector performance limit. My results:

Sending a single packet I get a consistent 65 micros round trip time.
Sending two packets I get around 39 micros round trip time on average.
Sending 10 packets I get around 17 micros round trip time on average.
Sending 10,000 packets I get around 10,098 nanos round trip time on average.
Sending 1 million packets I get 9,977 nanos round trip time on average.

Conclusions

So it looks like the physical barrier for the UDP packet round trip is an average of 10 microseconds although I got some packets making the trip in 8 micros (min time).
With busy spinning (thanks Peter) I was able to go from 200 micros on average to a consistent 65 micros on average for a single packet.
Not sure why ZeroMQ is 5 times slower than that. (Edit: Maybe because I am testing this on the same machine through loopback and ZeroMQ is using two different machines?)

706

asked Aug 23 '12 20:08

Julie

1 Answers

You often see cases there waking a thread can be very expensive, not just because it takes time for the thread to wake up, but the thread runs 2-5x slower for tens of micro-seconds afterwards as the caches and

The way I have avoided this in the past is to busy wait. Unfortunately selectNow creates a new collection every time you call it even if it is an empty collection. This generates so much garbage its not worth using.

One way around this it to busy wait on non-blocking sockets. This doesn't scale particularly well but can give you the lowest latency as the thread doesn't need to wake and the code you run after this is more likely to be in cache. If you use thread affinity as well, it can reduce your threads disturbance.

What I would also suggest is trying to make your code lock less and garbage less. If you do this you can can have a process in Java which sends a response to an incoming packet under 100 micro-seconds 90% of the time. This would allow you to process each packet at 100 Mb as they arrive (up to 145 micro-seconds apart due to bandwidth limitations) For a 1 Gb connection you can get pretty close.

If you want fast interprocess communication on the same box in Java, you could consider something like https://github.com/peter-lawrey/Java-Chronicle This uses shared memory to pass messages with round trip latencies (which is harder to do efficiently with sockets) of less than 200 nano-seconds. It also persists the data and is useful if you just want a fast way to produce a journal file.

160

answered Oct 17 '22 03:10

Peter Lawrey

Related questions
                            
                                Eclipse runs out of heap space on ant build
                            
                                Compile several XSD's containing duplicate definitions of the same element with JAXB
                            
                                limiting the network bandwidth of a java process
                            
                                JVM Arbitrary Precision Libraries
                            
                                sqlite and hibernate - is good idea?
                            
                                Netbeans Platform Layout
                            
                                java framework for aggregation and sliding windows implementation [closed]
                            
                                JAXB @XmlSeeAlso causing tight coupling to domain objects
                            
                                How is Java jitting inefficient code to run faster than efficient code?
                            
                                Windows Java child process doesn't input or output when set to parent's standard IO (Command Prompt)
                            
                                When using HBase as a source for MapReduce, can I extend TableInputFormatBase to create multiple splits and multiple mappers for each region?
                            
                                Is it possible to mark java objects non-collectable from gc perspective to save on gc-sweep time?
                            
                                Is it possible to handle a SEGFAULT orginating in native code?
                            
                                JTextPane/JEditorPane and weird text issue
                            
                                Using java generics to ensure that argument received is same as class or subtype thereof
                            
                                Can you detect in Java if the code is being debugged? [duplicate]
                            
                                Any type casting done by javac?
                            
                                Exclude JUnit from Eclipse exported JAR
                            
                                How to prevent Jackson from outputting pretty print JSON?
                            
                                SSLException: SSL peer shut down incorrectly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java NIO selector minimum possible latency

Tags:

java

zeromq

nio

real-time

messaging

Julie

People also ask

1 Answers

Peter Lawrey

Recent Activity

Donate For Us