Scrapy:In a request fails (eg 404,500), how to ask for another alternative request?

2 Answers

Use "errback" in the Request like errback=self.error_handler where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.

see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html

183

answered Sep 30 '22 17:09

Omair Shamshir

Just set handle_httpstatus_list = [404, 500] and check for the status code in parse method. Here's an example:

from scrapy.http import Request
from scrapy.spider import BaseSpider


class MySpider(BaseSpider):
    handle_httpstatus_list = [404, 500]
    name = "my_crawler"

    start_urls = ["http://github.com/illegal_username"]

    def parse(self, response):
        if response.status in self.handle_httpstatus_list:
            return Request(url="https://github.com/kennethreitz/", callback=self.after_404)

    def after_404(self, response):
        print response.url

        # parse the page and extract items

Also see:

How to get the scrapy failure URLs?
Scrapy and response status code: how to check against it?
How to retry for 404 link not found in scrapy?

Hope that helps.

answered Sep 30 '22 17:09

alecxe

Related questions
                            
                                PHP alternative for Python's fabric
                            
                                python httplib Name or service not known
                            
                                "Operation not permitted" while dropping privileges using setuid() function
                            
                                problem compiling libjingle
                            
                                Python - convert comma separated string into reducing string list
                            
                                Python - concatenate a string to itself, multiple times
                            
                                How can I group equivalent items together in a Python list?
                            
                                Pyinstaller, NameError: global name 'quit' is not defined
                            
                                PyQt - how to detect and close UI if it's already running?
                            
                                How to make __repr__ to return unicode string
                            
                                Python Paramiko - Run command
                            
                                How can I disable the Django Celery admin modules?
                            
                                Migrating virtualenv and Github between computers
                            
                                PyGame: Applying transparency to an image with alpha?
                            
                                python sum the values of lists of list
                            
                                How to get the first and last second in python using datetime module
                            
                                Parsing a lisp file with Python
                            
                                How to create a dict with letters as keys in a concise way?
                            
                                Python - how to check system load?
                            
                                redis-py AttributeError: 'module' object has no attribute

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy:In a request fails (eg 404,500), how to ask for another alternative request?

Tags:

python

web-scraping

http-status-code-404

scrapy

Zhang Jiuzhou

People also ask

2 Answers

Omair Shamshir

alecxe

Recent Activity

Donate For Us