Python Scrapy , how to define a pipeline for an item?

Tags:

I am using scrapy to crawl different sites, for each site I have an Item (different information is extracted)

Well, for example I have a generic pipeline (most of information is the same) but now I am crawling some google search response and the pipeline must be different.

For example:

GenericItem uses GenericPipeline

But the GoogleItem uses GoogleItemPipeline, but when the spider is crawling it tries to use GenericPipeline instead of GoogleItemPipeline....how can I specify which pipeline Google spider must use?

693

asked Jun 29 '09 05:06

llazzaro

1 Answers

Now only one way - check Item type in pipeline and process it or return "as is"

pipelines.py:

from grabbers.items import FeedItem

class StoreFeedPost(object):

    def process_item(self, domain, item):
        if isinstance(item, FeedItem):
            #process it...

        return item

items.py:

from scrapy.item import ScrapedItem

class FeedItem(ScrapedItem):
    pass

143

answered Oct 26 '22 20:10

slav0nic

Related questions
                            
                                "import torch" giving error "from torch._C import *, DLL load failed: The specified module could not be found"
                            
                                Efficient random generator for very large range (in python)
                            
                                What is a "cell class" in Keras?
                            
                                Airflow webserver gives cron error for dags with None as schedule interval
                            
                                Understanding Bilinear Layers
                            
                                Bicubic interpolation Python
                            
                                Convert Python dictionary to yaml
                            
                                Print specific keys and values from a deep nested dictionary in python 3.X
                            
                                Pytest skips test saying "asyncio not installed"
                            
                                Efficiently replace elements in array based on dictionary - NumPy / Python
                            
                                Pandas: TypeError: '>' not supported between instances of 'int' and 'str' when selecting on date column
                            
                                Comparison between Modin | Dask | Data.table | Pandas for parallel processing and out of memory csv files
                            
                                How does sklearn select threshold steps in precision recall curve?
                            
                                Python 3 type hint for string options [duplicate]
                            
                                How visualize attention LSTM using keras-self-attention package?
                            
                                InvalidArgumentException: Message: invalid argument: user data directory is already in use error using --user-data-dir to start Chrome using Selenium
                            
                                typing: Dynamically Create Literal Alias from List of Valid Values
                            
                                Why are Exceptions iterable?
                            
                                Choosing and deploying a comet server
                            
                                Line-wrapping problems with IPython shell

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Scrapy , how to define a pipeline for an item?

Tags:

python

scrapy

screen-scraping

llazzaro

People also ask

1 Answers

slav0nic

Recent Activity

Donate For Us