Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jsoup SocketTimeoutException: Read timed out

Tags:

java

jsoup

I get a SocketTimeoutException when I try to parse a lot of HTML documents using Jsoup.

For example, I got a list of links :

<a href="www.domain.com/url1.html">link1</a> <a href="www.domain.com/url2.html">link2</a> <a href="www.domain.com/url3.html">link3</a> <a href="www.domain.com/url4.html">link4</a> 

For each link, I parse the document linked to the URL (from the href attribute) to get other pieces of information in those pages.

So I can imagine that it takes lot of time, but how to shut off this exception Here is the whole stack trace:

java.net.SocketTimeoutException: Read timed out     at java.net.SocketInputStream.socketRead0(Native Method)     at java.net.SocketInputStream.read(Unknown Source)     at java.io.BufferedInputStream.fill(Unknown Source)     at java.io.BufferedInputStream.read1(Unknown Source)     at java.io.BufferedInputStream.read(Unknown Source)     at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)     at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)     at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)     at java.net.HttpURLConnection.getResponseCode(Unknown Source)     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)     at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)     at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)     at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)     at app.ForumCrawler.crawl(ForumCrawler.java:50)     at Main.main(Main.java:15) 
like image 836
C. Maillard Avatar asked Jul 04 '11 12:07

C. Maillard


People also ask

Why does Java net SocketTimeoutException read timed out?

Root Cause This problem is caused by an environment issue, such as: Server is trying to read data from the request, but its taking longer than the timeout value for the data to arrive from the client. Timeout here would typically be tomcat connector -> connectionTimeout attribute.

How do I resolve SocketTimeoutException?

Using try/catch/finally If you are a developer, so you can surround the socket connection part of your code in a try/catch/finally and handle the error in the catch. You might try connecting a second time, or try connecting to another possible socket, or simply exit the program cleanly.

What is read timed out?

Read Timed Out From the client side, the “read timed out” error happens if the server is taking longer to respond and send information. This could be due to a slow internet connection, or the host could be offline.


1 Answers

I think you can do

Jsoup.connect("...").timeout(10 * 1000).get();  

which sets timeout to 10s.

like image 66
MarcoS Avatar answered Sep 17 '22 13:09

MarcoS