I have the following directory structure: <pre class="prettyprint"><code>my_project/ __init__.py spiders/ __init__.py my_spider.py other_spider.py pipeines.py # other files </code></pre> Right now I can be in the <code>my_project</code> directory and start my crawl using <code>scrapy crawl my_spider</code>. What I'd like to achieve is to be able to run <code>scrapy crawl my_spider</code> with this updated structure: <pre class="prettyprint"><code>my_project/ __init__.py spiders/ __init__.py subtopic1/ __init__.py # <-- I get the same error whether this is present or not my_spider.py subtopicx/ other_spider.py pipeines.py # other files </code></pre> But right now I get this error: <blockquote> KeyError: 'Spider not found: my_spider' </blockquote> What is the appropriate way to organize Scrapy spiders into directories?

I know this is long over due but this is the right way to organize your spiders in nested directories. You set the modules location in the settings defined here. Example: <pre class="prettyprint"><code>SPIDER_MODULES = ['my_project.spiders', 'my_project.spiders.subtopic1', 'my_project.spiders.subtopicx'] </code></pre>

How can I organize my spiders into nested directories in Scrapy?

Tags:

scrapy

scrapy-spider

I have the following directory structure:

my_project/
  __init__.py
  spiders/
    __init__.py
    my_spider.py
    other_spider.py
  pipeines.py
  # other files

Right now I can be in the my_project directory and start my crawl using scrapy crawl my_spider.

What I'd like to achieve is to be able to run scrapy crawl my_spider with this updated structure:

my_project/
  __init__.py
  spiders/
    __init__.py
    subtopic1/
      __init__.py # <-- I get the same error whether this is present or not
      my_spider.py
    subtopicx/
      other_spider.py
  pipeines.py
  # other files

But right now I get this error:

KeyError: 'Spider not found: my_spider'

What is the appropriate way to organize Scrapy spiders into directories?

389

asked Jun 20 '16 06:06

YPCrumble

1 Answers

I know this is long over due but this is the right way to organize your spiders in nested directories. You set the modules location in the settings defined here.

Example:

SPIDER_MODULES = ['my_project.spiders', 'my_project.spiders.subtopic1', 'my_project.spiders.subtopicx']

113

answered Jan 02 '23 19:01

pariola

Related questions
                            
                                Scrapy how to ignore items with blank fields using Loader
                            
                                Python Scrapy tutorial KeyError: 'Spider not found:
                            
                                Scrapy: How to output items in a specific json format
                            
                                Scrapy: connection refused
                            
                                Scrapy middleware order
                            
                                Scrapy Return Multiple Items
                            
                                What does a red triangle mean in Visual Studio Code?
                            
                                How to enable cookiemiddleware in scrapy in python
                            
                                Scrapy retry or redirect middleware
                            
                                How can I package or install an entire program to run in an AWS Lambda function
                            
                                Scrapy csv file has uniform empty rows?
                            
                                How Can I install Twisted + Scrapy on Python3.6 and CentOs
                            
                                How to pass custom settings through CrawlerProcess in scrapy?
                            
                                I can't access scrapyd port 6800 from browser
                            
                                Scrapy - simple captcha solving example
                            
                                Why does scrapy throw an error for me when trying to spider and parse a site?
                            
                                Confused about running Scrapy from within a Python script
                            
                                How to pass multiple arguments to Scrapy spider (getting error running 'scrapy crawl' with more than one spider is no longer supported)?
                            
                                How to create the scrapy project by python3
                            
                                Running dozens of Scrapy spiders in a controlled manner

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With