Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy: connection refused

I'm receiving an error when trying to test scrapy installation:

$ scrapy shell http://www.google.es
j2011-02-16 10:54:46+0100 [scrapy] INFO: Scrapy 0.12.0.2536 started (bot: scrapybot)
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled extensions: TelnetConsole, SpiderContext, WebService, CoreStats, MemoryUsage, CloseSpider
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled scheduler middlewares: DuplicatesFilterMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpProxyMiddleware, HttpCompressionMiddleware, DownloaderStats
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Enabled item pipelines: 
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-02-16 10:54:46+0100 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-02-16 10:54:46+0100 [default] INFO: Spider opened
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 1 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Retrying <GET http://www.google.es> (failed 2 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] DEBUG: Discarding <GET http://www.google.es> (failed 3 times): Connection was refused by other side: 111: Connection refused.
2011-02-16 10:54:47+0100 [default] ERROR: Error downloading <http://www.google.es>: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionRefusedError'>: Connection was refused by other side: 111: Connection refused.
    ]
2011-02-16 10:54:47+0100 [scrapy] ERROR: Shell error
    Traceback (most recent call last):
    Failure: scrapy.exceptions.IgnoreRequest: Connection was refused by other side: 111: Connection refused.

2011-02-16 10:54:47+0100 [default] INFO: Closing spider (shutdown)
2011-02-16 10:54:47+0100 [default] INFO: Spider closed (shutdown)

Versions:

  • Scrapy 0.12.0.2536
  • Python 2.6.6
  • OS: Ubuntu 10.10

EDIT: I can reach it with my browser, wget, telnet google.es 80 and it happens with all the sites.

like image 208
anders Avatar asked Feb 16 '11 10:02

anders


2 Answers

Mission 1: Scrapy will send a usergent with 'bot' in it. Sites might block based on user agent also.

Try over-riding USER_AGENT in settings.py

Eg: USER_AGENT = 'Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.7'

Mission 2: Try giving a delay between request, to spoof that a human is sending the request.

DOWNLOAD_DELAY = 0.25 

Mission 3: If nothing works, install wireshark and see the difference in request header (or) post data while scrapy sends and when your browser sends.

like image 124
nizam.sp Avatar answered Nov 03 '22 00:11

nizam.sp


Probably there is an issue with your network connection.

First of all, check your internet connection.

If you access net through proxy server, you should a piece of code into your scrapy project (http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware)

Anyway, try upgrade your scrapy version.

like image 32
Panos Angelopoulos Avatar answered Nov 02 '22 23:11

Panos Angelopoulos