Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Causes of Linux UDP packet drops

I have a Linux C++ application which receives sequenced UDP packets. Because of the sequencing, I can easily determine when a packet is lost or re-ordered, i.e. when a "gap" is encountered. The system has a recovery mechanism to handle gaps, however, it is best to avoid gaps in the first place. Using a simple libpcap-based packet sniffer, I have determined that there are no gaps in the data at the hardware level. However, I am seeing a lot of gaps in my application. This suggests the kernel is dropping packets; it is confirmed by looking at the /proc/net/snmp file. When my application encounters a gap, the Udp InErrors counter increases.

At the system level, we have increased the max receive buffer:

# sysctl net.core.rmem_max
net.core.rmem_max = 33554432

At the application level, we have increased the receive buffer size:

int sockbufsize = 33554432
int ret = setsockopt(my_socket_fd, SOL_SOCKET, SO_RCVBUF,
        (char *)&sockbufsize,  (int)sizeof(sockbufsize));
// check return code
sockbufsize = 0;
ret = getsockopt(my_socket_fd, SOL_SOCKET, SO_RCVBUF, 
        (char*)&sockbufsize, &size);
// print sockbufsize

After the call to getsockopt(), the printed value is always 2x what it is set to (67108864 in the example above), but I believe that is to be expected.

I know that failure to consume data quickly enough can result in packet loss. However, all this application does is check the sequencing, then push the data into a queue; the actual processing is done in another thread. Furthermore, the machine is modern (dual Xeon X5560, 8 GB RAM) and very lightly loaded. We have literally dozens of identical applications receiving data at a much higher rate that do not experience this problem.

Besides a too-slow consuming application, are there other reasons why the Linux kernel might drop UDP packets?

FWIW, this is on CentOS 4, with kernel 2.6.9-89.0.25.ELlargesmp.

like image 742
Matt Avatar asked May 06 '11 15:05

Matt


1 Answers

If you have more threads than cores and equal thread priority between them it is likely that the receiving thread is starved for time to flush the incoming buffer. Consider running that thread at a higher priority level than the others.

Similarly, although often less productive is to bind the thread for receiving to one core so that you do not suffer overheads of switching between cores and associated cache flushes.

like image 115
Steve-o Avatar answered Oct 11 '22 15:10

Steve-o