Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Scrapy - Direct spider to specific Pipeline

I have a Scrapy project with multiple spiders along with multiple pipelines. Is there a way I can tell Spider A to use pipeline A, etc???

My pipelines.py has multiple pipeline classes each doing something different and I want to be able to tell a spider to use a specific pipeline.

I do not see any obvious ways looking at the available scrapy commands to do this...

like image 718
xXPhenom22Xx Avatar asked Aug 03 '13 17:08

xXPhenom22Xx


2 Answers

ITEM_PIPELINES setting is defined globally for all spiders in the project during the engine start. It cannot be changed per spider on the fly.

Here's what you can do. Define what spiders should be processed via the pipeline in the pipeline itself. Skip/continue processing items returned by spiders in the process_item method of your pipeline, e.g.:

def process_item(self, item, spider): 
    if spider.name not in ['spider1', 'spider2']: 
        return item  

    # process item

Also see:

  • Is there any method to using seperate scrapy pipeline for each spider?

Hope that helps.

like image 152
alecxe Avatar answered Sep 24 '22 07:09

alecxe


It is possible to specify the pipeline to use in the custom_settings property of your spider class:

class BookSpider(BaseSpider):
    name = "book_spider"

    custom_settings = {
        'ITEM_PIPELINES': {
            'my_app.pipelines.BookPipeline': 300,
        }
    }

    def parse(self, response):
        return
like image 32
sfenske Avatar answered Sep 21 '22 07:09

sfenske