Raise close spider from Scrapy pipeline

Tags:

scrapy

I need to raise CloseSpider from a Scrapy Pipeline. Either that or return some parameter from the Pipeline back to the Spider to do the raise.

For example, if the date already exists raise CloseSpider:

raise CloseSpider('Already been scraped:' + response.url)

Is there a way to do this?

445

asked May 20 '18 04:05

2 Answers

As from scrapy docs, CloseSpider Exception can only be raised from a callback function (by default parse function) in a Spider only. Raising it in pipeline will crash spider. To achieve the similar results from a pipeline, you can initiate a shutdown signal, that will close scrapy gracefully.

from scrapy.project import crawler  
crawler._signal_shutdown(9,0)

Do remember ,scrapy might process already fired or even scheduled requests even after initiating shutdown signal.

To do it from Spider, set some variable in Spider from Pipeline like this.

def process_item(self, item, spider):
    if some_condition_is_met:
        spider.close_manually = True

After this in the callback function of your spider , you can raise close spider exception.

def parse(self, response):
    if self.close_manually:
        raise CloseSpider('Already been scraped.')

answered Oct 22 '22 06:10

Ahsan Roy

I prefer the following solution.

class MongoDBPipeline(object):

def process_item(self, item, spider):
    spider.crawler.engine.close_spider(self, reason='duplicate')

Source: Force spider to stop in scrapy

answered Oct 22 '22 05:10

Macbric

Related questions
                            
                                Sqlalchemy setup for postgresql with timescaledb extension [duplicate]
                            
                                Match multiple keys values to database entry in TinyDB?
                            
                                pyspark: Could not find valid SPARK_HOME
                            
                                Call __exit__ on all members of a class
                            
                                How to get accumulative maximum indices with numpy in Python?
                            
                                Check if class property has a setter
                            
                                Groupwise sorting in pandas
                            
                                Plotly: Australia Choropleth map
                            
                                Bug writing audio using custom video writer library
                            
                                Is it necessary to close session after tensorflow InteractiveSession()
                            
                                how to run the code before the app.run() in flask?
                            
                                Pyspark CountVectorizer and Word Frequency in a corpus
                            
                                Setting a Plotly Dash dcc.dropdown value dynamically
                            
                                PyMySQL Access Denied "using password (no") but using password
                            
                                How to use two models in Tensorflow object Detection API
                            
                                Params for functions in jupyter lab w/ Python
                            
                                How to convert CIDR to IP ranges using python3?
                            
                                Tensorflow parsing and reshaping float list in Dataset.map()
                            
                                Reenable urllib3 warnings
                            
                                How to select top n row from each group after group by in pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Raise close spider from Scrapy pipeline

Tags:

python

scrapy

MoreScratch

People also ask

2 Answers

Ahsan Roy

Macbric

Recent Activity

Donate For Us