Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy User timeout caused connection failure

Tags:

python

scrapy

I am using scrapy to download images but got timeout error:

Retrying <GET http://www/***.jpg> (failed 1 times): User timeout caused connection failure

However, I am able to download the image with wget instantly. DOWNLOAD_TIMEOUT (scrapy parameter) is set to default 180 sec, so this should not be the root cause of the error. I have tried using scrapy with proxy and non-proxy, both gives me above error.

like image 231
Harry Avatar asked Sep 08 '13 16:09

Harry


1 Answers

If you are scraping multiple images (especially from multiple domains), then downloads will happen concurrently and each download may take longer when compared to downloading a single image from the command line. Try decreasing the CONCURRENT_REQUESTS setting and increasing the DOWNLOAD_TIMEOUT.

Check with scrapy fetch URL that you can retrieve the image to rule out a Scrapy issue.

Finally, check the differences in request headers (User-agent, cookies, referrer, etc.), some difference here could account for a difference in the response from the server. If you can find a header that makes the difference, it's easy to change in Scrapy.

like image 140
Shane Evans Avatar answered Oct 05 '22 13:10

Shane Evans