We have an application that runs on Tomcat 6 (6.0.35.0 to be precise), and most of our engineers on Mac OS are having problems starting Tomcat due to the socketAccept call in the Catalina.await method throwing a SocketException:
SEVERE: StandardServer.await: accept: java.net.SocketException: Invalid argument at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:522) at java.net.ServerSocket.accept(ServerSocket.java:490) at org.apache.catalina.core.StandardServer.await(StandardServer.java:431) at org.apache.catalina.startup.Catalina.await(Catalina.java:676) at org.apache.catalina.startup.Catalina.start(Catalina.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) at mycompany.tomcat.startup.ThreadDumpWrapper.main(ThreadDumpWrapper.java:260) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.tanukisoftware.wrapper.WrapperStartStopApp.run(WrapperStartStopApp.java:238) at java.lang.Thread.run(Thread.java:722)
This causes Tomcat to shut down immediately after startup (and no small amount of rage). We think this has been with us for the duration on Mac OS w/ Java 1.7, in the last several months a lot of us have switched to Macbook Pros. Up until now, the only symptom was occasional zero byte responses from Tomcat, due to this exception also being thrown on a socketRead. Errors don't hit the logs and we'd individually shrugged it off as an isolated problem, and only found the cause when the startup problem started and I set a SocketException breakpoint:
Daemon Thread [http-8080-1] (Suspended (breakpoint at line 47 in SocketException)) SocketException.<init>(String) line: 47 SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available [native method] SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) line: not available SocketInputStream.read(byte[], int, int, int) line: 150 SocketInputStream.read(byte[], int, int) line: 121 InternalInputBuffer.fill() line: 735 InternalInputBuffer.parseRequestLine() line: 366 Http11Processor.process(Socket) line: 814 Http11Protocol$Http11ConnectionHandler.process(Socket) line: 602 JIoEndpoint$Worker.run() line: 489 Thread.run() line: 722
For arguments:
arg0 FileDescriptor (id=499) fd 1097 useCount AtomicInteger (id=503) value 2 arg1 (id=502) arg2 0 arg3 8192 arg4 20000
The problem is time sensitive. Increasing startup time due to application changes (lots more Spring introspection/singleton overhead) seems to be the factor that causes this to affect Tomcat startup; the tipping point being about 160 seconds. We can mitigate the problem by disabling some of the non-mandatory contexts we don't need during development to reduce startup time, but I'd prefer to find the root cause.
The specifics of the application are far too complex to go into too much detail, but I have a hunch that this might relate to an earlier bind, so I'll at least list the listening ports on my machine:
localhost:32000 - Java service wrapper port *:10001 - RMI registry *:2322 - Java debug *:56566 - RMI *:8180 - Tomcat HTTP connector *:8543 - Tomcat HTTPS connector *:2223 - Tomcat Internal HTTP connector (used for cross-server requests) *:14131 - 'Locking' port to determine if an internal service is running *:56571 - EhCache RMI *:56573 - RMI *:62616 - ActiveMQ broker *:5001 - SOAPMonitorService *:8109 - Tomcat shutdown port
-Djava.net.preferIPv4Stack=true
. I've always had that option configured5000
to 10000
/etc/hosts
so the machine hostname resolves to localhost, and configuring JVM options to prefer IPv4 and to not prefer IPv6 addresses (This answer: https://stackoverflow.com/a/16318860/364206)For those interested in hosts configuration, it's the same as default. I can reproduce this on a Fusion VM w/ a clean install of 10.8:
## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Do not change this entry. ## 127.0.0.1 localhost 255.255.255.255 broadcasthost ::1 localhost fe80::1%lo0 localhost
Due to the apparent time sensitive nature of the issue, setting breakpoints to troubleshoot the issue causes it to not occur. As requested in the comments, I also captured arg0
for SocksSocketImpl(PlainSocketImpl).socketAccept(SocketImpl)
, nothing seems out of the ordinary.
arg0 SocksSocketImpl (id=460) address InetAddress (id=465) canonicalHostName null holder InetAddress$InetAddressHolder (id=475) address 0 family 0 hostName null applicationSetProxy false closePending false cmdIn null cmdOut null cmdsock null CONNECTION_NOT_RESET 0 CONNECTION_RESET 2 CONNECTION_RESET_PENDING 1 external_address null fd FileDescriptor (id=713) fd -1 useCount AtomicInteger (id=771) value 0 fdLock Object (id=714) fdUseCount 0 localport 0 port 0 resetLock Object (id=716) resetState 0 server null serverPort 1080 serverSocket null shut_rd false shut_wr false socket Socket (id=718) bound false closed false closeLock Object (id=848) connected false created false impl null oldImpl false shutIn false shutOut false socketInputStream null stream false timeout 0 trafficClass 0 useV4 false
I think all of the threads where the exceptions are thrown are victims of an earlier call, one that doesn't result in a SocketException so I haven't been able to catch it. Being able to start Tomcat by reducing startup times convinces me that the trigger is probably some scheduled task that performs a socket based operation, which then affects other socket operations.
That doesn't explain how and why this could affect several threads, no matter what we're doing to cause this condition, mysterious SocketExceptions shouldn't bubble up from native code and cause these exceptions simultaneously on multiple threads - that is, two threads making outgoing web service calls, the Tomcat await call, and several TP processor threads repeatedly.
Given the generic message, I assumed that an EINVAL
error must be returned from one of the system calls in the socketAccept JNI code, so I traced the system calls leading up to the exception; there's no EINVAL
returned from any system call. So, I went to the OpenJDK sources looking for conditions in the socketAccept code that would set and then throw an EINVAL
, but I also couldn't find any code that sets errno
to EINVAL
, or calls NET_ThrowByNameWithLastError
, NET_ThrowCurrent
or NET_ThrowNew
in a way that would throw a SocketException with this default error message.
As far as the system calls, we don't seem to get as far as the accept system call:
PID/THRD RELATIVE ELAPSD CPU SYSCALL(args) = return 6606/0x2c750d: 221538243 5 0 sigprocmask(0x1, 0x0, 0x14D8BE100) = 0x0 0 6606/0x2c750d: 221538244 3 0 sigaltstack(0x0, 0x14D8BE0F0, 0x0) = 0 0 6606/0x2c750d: 221538836 14 10 socket(0x2, 0x1, 0x0) = 1170 0 6606/0x2c750d: 221538837 3 0 fcntl(0x492, 0x3, 0x4) = 2 0 6606/0x2c750d: 221538839 3 1 fcntl(0x492, 0x4, 0x6) = 0 0 6606/0x2c750d: 221538842 5 2 setsockopt(0x492, 0xFFFF, 0x4) = 0 0 6606/0x2c750d: 221538852 7 4 bind(0x492, 0x14D8BE5D8, 0x10) = 0 0 6606/0x2c750d: 221538857 5 2 listen(0x492, 0x1, 0x4) = 0 0 6606/0x2c750d: 221539625 6 2 psynch_cvsignal(0x7FEFBFE00868, 0x10000000200, 0x100) = 257 0 6606/0x2c750d: 221539633 4 1 write(0x2, "Apr 18, 2013 11:05:35 AM org.apache.catalina.core.StandardServer await\nSEVERE: StandardServer.await: accept: \njava.net.SocketException: Invalid argument\n\tat java.net.PlainSocketImpl.socketAccept(Native Method)\n\tat java.net.PlainSocketImpl.socketAcce", 0x644) = 1604 0
So, I think the problem occurs in the timeout handling code at the top of the accept loop in socketAccept
, but I couldn't find any case where NET_Timeout
would set errno
to EINVAL
, and result in this SocketException being thrown. I'm referring to this code; I assume the jdk7u branch is for the most part what ships in the Oracle JDK:
I can't find anyone in the outside world affected by this particular problem on Mac OS, but almost everyone here is affected. There must be some application configuration that contributes, but I've exhausted every avenue I can think of to find the root cause.
Pointers on troubleshooting or insight on a possible cause would be much appreciated.
The java. net. SocketException: Connection reset error usually comes when one of the parties in TCP connection like client or server is trying to read/write data, but other parties abruptly close the connection like it was crashed, stopped or terminated.
java.net.SocketException: Connection resetThis SocketException occurs on the server-side when the client closed the socket connection before the response could be returned over the socket. For example, by quitting the browser before the response was retrieved. Connection reset simply means that a TCP RST was received.
Have you tried turning on JNI debugging with -Xcheck:jni
? Interestingly the Oracle documentation uses a PlainSocketImpl.socketAccept
error as an example of when to use this.
Note also that the implication of Bug 7131399 is that the JNI uses poll()
on most platforms but select()
on Mac OS due to a problem with poll()
on the Mac. So maybe select()
is broken too. Digging in a bit further, select() will return EINVAL if "ndfs is greater than FD_SETSIZE and _DARWIN_UNLIMITED_SELECT is not defined." FD_SETSIZE is 1024 and it sounds like you have a ton of applications loading up, so perhaps it all filters down to waiting on more that 1024 FDs at one time.
For extra credit, see if the related (supposedly fixed) Java bug is in fact fixed on your machine. The bug report has pointers to test cases.
Thanks to Old Pro's answer, I confirmed that the select()
FD_SETSIZE limitation is the cause. I located an existing bug for this limitation:
https://bugs.openjdk.java.net/browse/JDK-8021820
The problem can be reproduced with the following code:
import java.io.*; import java.net.*; public class SelectTest { public static void main(String[] args) throws Exception { // Use 1024 file descriptors. There'll already be some in use, obviously, but this guarantees the problem will occur for(int i = 0; i < 1024; i++) { new FileInputStream("/dev/null"); } ServerSocket socket = new ServerSocket(8080); socket.accept(); } }
Almost a year later, Java 7u60 has a fix this problem:
http://www.oracle.com/technetwork/java/javase/2col/7u60-bugfixes-2202029.html
I also discovered the Tomcat's WebappClassLoader closes file handles after 90 seconds, which explains why setting break points prevented the issue from occurring.
I had exactly the same issue (with Tomcat7), and what seems to work for me is to tick the "Publish module contexts to separate XML files" option when I'm running tomcat inside Eclipse. Have you tried that already?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With