I am using scrapy to download images but got timeout error:
Retrying <GET http://www/***.jpg> (failed 1 times): User timeout caused connection failure
However, I am able to download the image with wget instantly. DOWNLOAD_TIMEOUT (scrapy parameter) is set to default 180 sec, so this should not be the root cause of the error. I have tried using scrapy with proxy and non-proxy, both gives me above error.
If you are scraping multiple images (especially from multiple domains), then downloads will happen concurrently and each download may take longer when compared to downloading a single image from the command line. Try decreasing the CONCURRENT_REQUESTS setting and increasing the DOWNLOAD_TIMEOUT.
Check with scrapy fetch URL
that you can retrieve the image to rule out a Scrapy issue.
Finally, check the differences in request headers (User-agent, cookies, referrer, etc.), some difference here could account for a difference in the response from the server. If you can find a header that makes the difference, it's easy to change in Scrapy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With