I have designed a crawler where there will be two spiders.I have designed these using scrapy.
These spiders will run independently by fetching data from the database.
We are running these spiders using a reactor.As we know that we cannot run the reactor repeatedly
we give some 500+ links to the second spider to crawl.
If we do like this we have a problem of port error. i.e scrapy is using only single port
Error caught on signal handler: <bound method ?.start_listening of <scrapy.telnet.TelnetConsole instance at 0x0467B440>>
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 1070, in _inlineCallbacks
result = g.send(result)
File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\core\engine.py", line 75, in start yield self.signals.send_catch_log_deferred(signal=signals.engine_started)
File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\signalmanager.py", line 23, in send_catch_log_deferred
return signal.send_catch_log_deferred(*a, **kw)
File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\utils\signal.py", line 53, in send_catch_log_deferred
*arguments, **named)
--- <exception caught here> ---
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 137, in maybeDeferred
result = f(*args, **kw)
File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\xlib\pydispatch\robustapply.py", line 47, in robustApply
return receiver(*arguments, **named)
File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\telnet.py", line 47, in start_listening
self.port = listen_tcp(self.portrange, self.host, self)
File "C:\Python27\lib\site-packages\scrapy-0.16.5-py2.7.egg\scrapy\utils\reactor.py", line 14, in listen_tcp
return reactor.listenTCP(x, factory, interface=host)
File "C:\Python27\lib\site-packages\twisted\internet\posixbase.py", line 489, in listenTCP
p.startListening()
File "C:\Python27\lib\site-packages\twisted\internet\tcp.py", line 980, in startListening
raise CannotListenError(self.interface, self.port, le)
twisted.internet.error.CannotListenError: Couldn't listen on 0.0.0.0:6073: [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted.
So what is the problem occuring here?? Then what is the optimal way to solve this scenario??Please help...
p.s: I have increased the number of ports in settings but it is always taking 6073 as default.
Easiest way would be to disable the Telnet Console by adding this to your settings.py
:
EXTENSIONS = {
'scrapy.telnet.TelnetConsole': None
}
See also http://doc.scrapy.org/en/latest/topics/settings.html#extensions for a list of by-default enabled extensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With