I'm trying to test out some XPaths using the Scrapy shell, but it seems to be calling on my incomplete spider module to do the scraping, which is not what I want. Is there a way to define which spider scrapy uses with its shell? Even more, why is Scrapy doing this; shouldn't it know the spider is not ready for use? That's why I'm using the shell right? Otherwise I'd be using
scrapy crawl spider_name
if I wanted to use a specific spider.
Edit: After looking at the Spider docs forever, I found the following description for the spider instance used in the shell.
spider - the Spider which is known to handle the URL, or a BaseSpider object if there is no spider found for the current URL
This means, scrapy has correlated the URL with my spider, and is using it instead of a BaseSpider. Unfortunately, my spider is not ready for testing, so is there a way to force it to use a BaseSpider for the shell instead?
Scrapy automatically selects the spider based on the allowed_domains
attribute. If there are more than one spider for given domain Scrapy will use BaseSpider
.
But, it's just a python shell, you can instantiate any spider you want.
>>> from myproject.spiders.myspider import MySpider >>> spider = MySpider() >>> spider.parse_item(response)
Edit: as workaround to not use your spider you can set allowed_domains = []
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With