Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java IOException: No buffer space available while sending UDP packets on Linux

Tags:

java

linux

udp

I have a third party component which tries to send too many UDP messages to too many separate addresses in a certain situation. This is a burst which happens when the software is started and the situation is temporary. I'm actually not sure is it the plain amount of the messages or the fact that each of them go to a separate IP address.

Anyway, changing the underlying protocol or the problematic component is not an option, so I'm looking for a workaround. The StackTrace looks like this:

java.io.IOException: No buffer space available
    at java.net.PlainDatagramSocketImpl.send(Native Method)
    at java.net.DatagramSocket.send(DatagramSocket.java:612)

This issue occurs (at least) with Java versions 1.6.0_13 and 1.6.0_10 and Linux versions Ubuntu 9.04 and RHEL 4.6.

Are there any Java system properties or Linux configuration tweaks which might help?

like image 288
auramo Avatar asked Jun 25 '09 12:06

auramo


3 Answers

I've finally determined what the issue is. The Java IOException is misleading since it is "No buffer space available" but the root issue is that the local ARP table has been filled. On Linux, the default ARP table lookup is 1024 (files /proc/sys/net/ipv4/neigh/default/gc_thresh1, /proc/sys/net/ipv4/neigh/default/gc_thresh2, /proc/sys/net/ipv4/neigh/default/gc_thresh3).

What was happening in my case (and I assume your case), is that your Java code is sending out UDP packets from an IP address that is in the same subnet as your destination addresses. When this is the case, the Linux machine will perform an ARP lookup to translate the IP address into the hardware MAC address. Since you are blasting out packets to many different IPs the local ARP table fills up quickly, hits 1024, and that is when the Java exception is thrown.

The solution is simple, either increase the limit by editing the files I mentioned earlier, or move your server into a different subnet than your destination addresses, which then causes the Linux box to no longer perform neighbor ARP lookups (instead will be handled by a router on the network).

like image 121
Matthew B. Jones Avatar answered Nov 12 '22 01:11

Matthew B. Jones


When sending lots of messages, especially over gigabit ethernet in Linux, the stock parameters for your kernel are usually not optimal. You can increase the Linux kernel buffer size for networking through:

echo 1048576 > /proc/sys/net/core/wmem_max
echo 1048576 > /proc/sys/net/core/wmem_default
echo 1048576 > /proc/sys/net/core/rmem_max
echo 1048576 > /proc/sys/net/core/rmem_default

As root.

Or use sysctl

sysctl -w net.core.rmem_max=8388608 

There are tons of network options

See Linux Network Tuning by IBM and More tuning information

like image 3
Aiden Bell Avatar answered Nov 12 '22 02:11

Aiden Bell


Might be a bit complicated but as I know, Java uses the SPI1 pattern for the network sub-library. This allows you to change the implementation used for various network operations. If you use OpenJDK then you could gain some hints how and what to wrap with your implementation. Then, in your implementation you slow down the I/O with some sleeps for example.

Or, just for fun, you could override the default DatagramSocket with your modified implementation. Have the same package name for it and - as I know - it will take precedence over the default JRE class. At least this method worked for me on some buggy 3rd party library.

Edit:

1Service Provider Interface is a method to separate client and service code within an API. This separation allows different client and different provider implementations. Can be recognized from the name ending in Impl usually, just like in your stack trace java.net.PlainDatagramSocketImpl is the provider implementation where the DatagramSocket is the client side API.

You commented that you don't want to slow down the communication the entire way. There are several hacks to avoid it, for example measure the time in your code and slow the communication within the first 1-2 minutes starting at your first incoming method call. Then you can skip the sleep.

Another option would be to identify the misbehaving class in the library, JAD it and fix it. Then replace the original class file in the library.

like image 1
akarnokd Avatar answered Nov 12 '22 00:11

akarnokd