Scrapy

Question

I can not find a solution to the following problem. I am using Scrapy (latest version) and am trying to debug a spider. Using scrapy shell https://jigsaw.w3.org/HTTP/300/301.html -> it does not follow the redirect ( it is using a default spider to get the data). If I am running my spider it follows the 301 - but I can not debug.

How can you make the shell to follow the 301 to allow one to debug the final page?

Granitosaurus · Accepted Answer

Scrapy uses Redirect Middleware for redirects, however it's not enabled in shell. Quick fix for this:

scrapy shell "https://jigsaw.w3.org/HTTP/300/301.html"
fetch(response.headers['Location'])

Also to debug your spider you probably want to inspect the response your spider is receiving:

from scrapy.shell import inspect_response
def parse(self, response)
    inspect_response(response, self)
    # the spider will stop here and open up an interactive shell during the run

Scrapy - 301 redirect in shell

Tags:

python

web-scraping

scrapy-shell

Pixelartist

1 Answers

Granitosaurus

Recent Activity

Donate For Us