I have written a scraper using scrapy in python. It contains 100 start_urls.
I want to terminate the scraping process once a condition is met. ie terminate scraping of a particular div is found. By terminate I mean it should stop scraping all the urls .
Is it possible
What you're looking for is the CloseSpider
exception.
Add the following line somewhere at the top of your source file:
from scrapy.exceptions import CloseSpider
And when you detect that your termination condition is met, simply do something like
raise CloseSpider('termination condition met')
in your callback method (instead of returning or yielding an Item
or Request
).
Note that requests that are still in progress (HTTP request sent, response not yet received) will still be parsed. No new request will be processed though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With