I get a SocketTimeoutException
when I try to parse a lot of HTML documents using Jsoup.
For example, I got a list of links :
<a href="www.domain.com/url1.html">link1</a> <a href="www.domain.com/url2.html">link2</a> <a href="www.domain.com/url3.html">link3</a> <a href="www.domain.com/url4.html">link4</a>
For each link, I parse the document linked to the URL (from the href attribute) to get other pieces of information in those pages.
So I can imagine that it takes lot of time, but how to shut off this exception Here is the whole stack trace:
java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source) at sun.net.www.http.HttpClient.parseHTTP(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at java.net.HttpURLConnection.getResponseCode(Unknown Source) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132) at app.ForumCrawler.crawl(ForumCrawler.java:50) at Main.main(Main.java:15)
Root Cause This problem is caused by an environment issue, such as: Server is trying to read data from the request, but its taking longer than the timeout value for the data to arrive from the client. Timeout here would typically be tomcat connector -> connectionTimeout attribute.
Using try/catch/finally If you are a developer, so you can surround the socket connection part of your code in a try/catch/finally and handle the error in the catch. You might try connecting a second time, or try connecting to another possible socket, or simply exit the program cleanly.
Read Timed Out From the client side, the “read timed out” error happens if the server is taking longer to respond and send information. This could be due to a slow internet connection, or the host could be offline.
I think you can do
Jsoup.connect("...").timeout(10 * 1000).get();
which sets timeout to 10s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With