I have a server which sends UDP packets via multicast and a number of clients which are listing to those multicast packets. Each packet has a fixed size of 1040 Bytes, the whole data size which is sent by the server is 3GByte.
My environment is follows:
1 Gbit Ethernet Network
40 Nodes, 1 Sender Node and 39 receiver Nodes. All Nodes have the same hardware configuration: 2 AMD CPUs, each CPU has 2 Cores @2,6GHz
On the client side, one thread reads the socket and put the data into a queue. One additional thread pops the data from the queue and does some light weight processing.
During the multicast transmission I recognize a packet drop rate of 30% on the node side. By observing the netstat –su statistics I can say, that the missing packets by the client application are equal to the RcvbufErrors value from the netstat output.
That means that all missing packets are dropped by the OS because the socket buffer was full, but I do not understand why the capturing thread is not able to read the buffer in time. During the transmission, 2 of the 4 cores are utilized by 75%, the rest is sleeping. I’m the only one who is using these nodes, and I would assume that this kind of machines have no problem to handle 1Gbit bandwidth. I have already done some optimization, by adding g++ compiler flags for amd cpus, this decrease the packet drop rate to 10%, but it is still too high in my opinion.
Of course I know that UDP is not reliable, I have my own correction protocol.
I do not have any administration permissions, so it’s not possible for me to change the system parameters.
Any hints how can I increase the performance?
EDIT: I solved this issue by using 2 threads which are reading the socket. The recv socket buffer still becomes full sometimes. But the average drop is under 1%, so it isn't a problem to handle it.
Tracking down network drops on Linux can be a bit difficult as there are many components where packet drops can happen. They can occur at the hardware level, in the network device subsystem, or in the protocol layers.
I wrote a very detailed blog post explaining how to monitor and tune each component. It's a bit hard to summarize as a succinct answer here since there are so many different components that need to be monitored and tuned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With