Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java socketRead0 Issue

Tags:

java

sockets

I'm developing a web cralwer with htmlunit and I have added all required timeout but I notice that the app hangs when the server of some website been crawled is not responding at when I use the Java VisualVM to do a thread dump:

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.net.SocksSocketImpl.readSocksReply(SocksSocketImpl.java:88)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:429)
at java.net.Socket.connect(Socket.java:525)
at com.gargoylesoftware.htmlunit.SocksSocketFactory.connectSocket(SocksSocketFactory.java:89)
at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:149)
at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:573)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:776)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:152)
at app.plugin.core.net.QHttpWebConnection.getResponse(QHttpWebConnection.java:30)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1439)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1358)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:307)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)
at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)

This is really frustrating since I have no control of those servers. This issue is seriously affecting the performance of my application.

Question:

  1. How can I solve this issue?
  2. Is there a way to get a list of socket connection opened by a Java app and use that to terminate the socket, like simluate that the server closed the connection?
like image 810
John Avatar asked Sep 22 '12 13:09

John


3 Answers

I believe that when you are in a Java native method, the stack trace will say RUNNABLE even if the call is actually blocked waiting for some event. In essence, I don't believe Java has any way of knowing what a native method is actually doing, so it flags these calls as RUNNABLE. I have seen this with socketRead0() and socketAccept() -- both of which typically block.

You need to set your timeout to a reasonable length of time such that your request will time out if the server is not responding but not too short in case the server is simply busy. Your application should be written to use multiple threads. I would try running a dozen or more threads and have each thread wait up to five or ten seconds for a response. There is virtually no overhead in having a handful of threads waiting. You should also be mindful of not bombarding a server with lots of requests when writing a web spider.

like image 165
Geoff Avatar answered Sep 29 '22 04:09

Geoff


Here's a blog post which is possibly related: http://javaeesupportpatterns.blogspot.fi/2011/04/javanetsocketinputstreamsocketread0.html

In short, solution is to make sure that socket timeout is defined. Default is 0, meaning no timeout. How exactly, that depends on the library, in this case apparently com.gargoylesoftware.htmlunit. At a quick glance correct method might be com.gargoylesoftware.htmlunit.WebClient.setTimeout.

like image 25
hyde Avatar answered Sep 29 '22 04:09

hyde


If your Java server is on Windows, your last resort is SysInternals TCPView.

http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx

From it you will see the list of all processes and all local and remote ports, which will include your Java app. You will have to pick the correct connection to close, and after that, the Java Thread will throw an exception and end.

Of course there's risk of closing the wrong connection. After all, this method is the last resort.

Update in 23 Aug 2019:

TCPView is slow when there're a large amount of connections.

The much faster alternative is CurrPorts (from NirSoft): https://www.nirsoft.net/utils/cports.html

like image 39
sken130 Avatar answered Sep 29 '22 06:09

sken130