I have a problem with scrapy. In a request fails (eg 404,500), how to ask for another alternative request? Such as two links can obtain price info, the one failed, request another automatically.
Request usage examples You can use the FormRequest. from_response() method for this job. Here's an example spider which uses it: import scrapy def authentication_failed(response): # TODO: Check the contents of the response and return True if it failed # or False if it succeeded.
if you want to keep a download delay of exactly one second, setting DOWNLOAD_DELAY=1 is the way to do it. But scrapy also has a feature to automatically set download delays called AutoThrottle . It automatically sets delays based on load of both the Scrapy server and the website you are crawling.
Making a request is a straightforward process in Scrapy. To generate a request, you need the URL of the webpage from which you want to extract useful data. You also need a callback function. The callback function is invoked when there is a response to the request.
Use "errback" in the Request like
errback=self.error_handler
where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.
see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html
Just set handle_httpstatus_list = [404, 500]
and check for the status code in parse
method. Here's an example:
from scrapy.http import Request
from scrapy.spider import BaseSpider
class MySpider(BaseSpider):
handle_httpstatus_list = [404, 500]
name = "my_crawler"
start_urls = ["http://github.com/illegal_username"]
def parse(self, response):
if response.status in self.handle_httpstatus_list:
return Request(url="https://github.com/kennethreitz/", callback=self.after_404)
def after_404(self, response):
print response.url
# parse the page and extract items
Also see:
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With