Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to collect stats from within scrapy spider callback?

How can I collect stats from within a spider callback?

Example

class MySpider(Spider):
     name = "myspider"
     start_urls = ["http://example.com"]

def parse(self, response):
    stats.set_value('foo', 'bar')

Not sure what to import or how to make stats available in general.

like image 671
mattes Avatar asked Apr 09 '14 01:04

mattes


People also ask

How do you get Scrapy stats?

The stats can be accessed through the spider_stats attribute, which is a dict keyed by spider domain name. This is the default Stats Collector used in Scrapy.

How do you get a response from Scrapy request?

Using FormRequest. You can use the FormRequest. from_response() method for this job. Here's an example spider which uses it: import scrapy def authentication_failed(response): # TODO: Check the contents of the response and return True if it failed # or False if it succeeded.

What is callback in Scrapy?

The callback of a request is a function that will be called when the response of that request is downloaded. The callback function will be called with the downloaded Response object as its first argument. Example: def parse_page1(self, response): return scrapy.

What is Start_urls in Scrapy?

start_urls contain those links from which the spider start crawling. If you want crawl recursively you should use crawlspider and define rules for that.


1 Answers

Check out the stats page from the scrapy documentation. The documentation states that the Stats Collector, but it may be necessary to add from scrapy.stats import stats to your spider code to be able to do stuff with it.

EDIT: At the risk of blowing my own trumpet, if you were after a concrete example I posted an answer about how to collect failed urls.

EDIT2: After a lot of googling, apparently no imports are necessary. Just use self.crawler.stats.set_value()!

like image 109
Talvalin Avatar answered Oct 07 '22 22:10

Talvalin