While running a crawler on a site, I'm getting the following error message a large number of times:
<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>
I don't get this error when running the crawler on different sites, and the pages it's trying to access I can reach either via browser or via curl
. Thus, I'm wondering what situations might cause this error to arise?
To clarify, the full error is something along the lines of:
2016-11-17 20:59:38 [scrapy] ERROR: Error downloading <GET http://www.peets.com/gifts/featured-gifts/holiday-gifts/sheng-puer-tea-50.html>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]
There are many different urls that produce a similar error, and likewise it doesn't always fail if I run it multiple times. So I'm unclear what ConnectionDone: Connection was closed cleanly
should imply in terms of what the problem was.
Today I had the same error. I think those websites have Crawler preventions. If I add:
USER_AGENT = 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0'
in settings.py
it solves the error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With