How can I collect stats from within a spider callback?
Example
class MySpider(Spider):
name = "myspider"
start_urls = ["http://example.com"]
def parse(self, response):
stats.set_value('foo', 'bar')
Not sure what to import
or how to make stats
available in general.
The stats can be accessed through the spider_stats attribute, which is a dict keyed by spider domain name. This is the default Stats Collector used in Scrapy.
Using FormRequest. You can use the FormRequest. from_response() method for this job. Here's an example spider which uses it: import scrapy def authentication_failed(response): # TODO: Check the contents of the response and return True if it failed # or False if it succeeded.
The callback of a request is a function that will be called when the response of that request is downloaded. The callback function will be called with the downloaded Response object as its first argument. Example: def parse_page1(self, response): return scrapy.
start_urls contain those links from which the spider start crawling. If you want crawl recursively you should use crawlspider and define rules for that.
Check out the stats page from the scrapy documentation. The documentation states that the Stats Collector, but it may be necessary to add from scrapy.stats import stats
to your spider code to be able to do stuff with it.
EDIT: At the risk of blowing my own trumpet, if you were after a concrete example I posted an answer about how to collect failed urls.
EDIT2: After a lot of googling, apparently no imports are necessary. Just use self.crawler.stats.set_value()
!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With