Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UDP sendto performance over loopback

Background

I have a very high throughput / low latency network app (goal is << 5 usec per packet) and I wanted to add some monitoring/metrics to it. I have heard about the statsd craze and seems a simple way to collect metrics and feed them into our time series database. Sending metrics is done via a small udp packet write to a daemon (typically running on same server).

I wanted to characterize the effects of sending ~5-10 udp packets in my data path to understand how much latency it would add and was surprised at how bad it is. I know this is a very obscure micro-benchmark but just wanted to get a rough idea on where it lands.

The question I have

I am trying to understand why it takes so long (relatively speaking) to send a UDP packet to localhost versus a remote host. Are there any tweaks I can make to reduce the latency to send a UDP packet? I am thinking the solution for me to push metric collection to an auxiliary core or actually run the statsd daemon on a seperate host.


My setup/benchmarks

CentOS 6.5 with some beefy server hardware.
The client test program I have been using is available here: https://gist.github.com/rishid/9178261
Compiled with gcc 4.7.3 gcc -O3 -std=gnu99 -mtune=native udp_send_bm.c -lrt -o udp_send_bm
The receiver side is running nc -ulk 127.0.0.1 12000 > /dev/null (ip change per IF)

I have ran this micro-benchmark with the following devices.
Some benchmark results:

  • loopback
    • Packet Size 500 // Time per sendto() 2159 nanosec // Total time 2.159518
  • integrated 1 Gb mobo controller
    • Packet Size 500 // Time per sendto() 397 nanosec // Total time 0.397234
  • intel ixgbe 10 Gb
    • Packet Size 500 // Time per sendto() 449 nanosec // Total time 0.449355
  • solarflare 10 Gb with userspace stack (onload)
    • Packet Size 500 // Time per sendto() 317 nanosec // Total time 0.317229
like image 527
RishiD Avatar asked Feb 23 '14 22:02

RishiD


1 Answers

Writing to loopback will not be an efficient way to communicate inter-process for profiling. Generally the buffer will be copied multiple times before it's processed, and you run the risk of dropping packets since you're using udp. You're also making additional calls into the operating system, so you add to the risk of context switching (~2us).

goal is << 5 usec per packet

Is this a hard real-time requirement, or a soft requirement? Generally when you're handling things in microseconds, profiling should be zero overhead. You're using solarflare?, so I think you're serious. The best way I know to do this is tapping into the physical line, and sniffing traffic for metrics. A number of products do this.

like image 168
Jason Avatar answered Oct 20 '22 00:10

Jason