Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Under what circumstances does Scrapy throw a "Connection was closed cleanly" error?

Tags:

python

scrapy

While running a crawler on a site, I'm getting the following error message a large number of times:

<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>

I don't get this error when running the crawler on different sites, and the pages it's trying to access I can reach either via browser or via curl. Thus, I'm wondering what situations might cause this error to arise?

To clarify, the full error is something along the lines of:

2016-11-17 20:59:38 [scrapy] ERROR: Error downloading <GET http://www.peets.com/gifts/featured-gifts/holiday-gifts/sheng-puer-tea-50.html>: [<twisted.python.failure.Failure twisted.internet.error.ConnectionDone: Connection was closed cleanly.>]

There are many different urls that produce a similar error, and likewise it doesn't always fail if I run it multiple times. So I'm unclear what ConnectionDone: Connection was closed cleanly should imply in terms of what the problem was.

like image 407
Nathaniel Ford Avatar asked Nov 17 '16 21:11

Nathaniel Ford


1 Answers

Today I had the same error. I think those websites have Crawler preventions. If I add:

USER_AGENT = 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0'  

in settings.py it solves the error.

like image 147
theqwang Avatar answered Sep 20 '22 13:09

theqwang