Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy - 301 redirect in shell

I can not find a solution to the following problem. I am using Scrapy (latest version) and am trying to debug a spider. Using scrapy shell https://jigsaw.w3.org/HTTP/300/301.html -> it does not follow the redirect ( it is using a default spider to get the data). If I am running my spider it follows the 301 - but I can not debug.

How can you make the shell to follow the 301 to allow one to debug the final page?

like image 824
Pixelartist Avatar asked Dec 25 '22 03:12

Pixelartist


1 Answers

Scrapy uses Redirect Middleware for redirects, however it's not enabled in shell. Quick fix for this:

scrapy shell "https://jigsaw.w3.org/HTTP/300/301.html"
fetch(response.headers['Location'])

Also to debug your spider you probably want to inspect the response your spider is receiving:

from scrapy.shell import inspect_response
def parse(self, response)
    inspect_response(response, self)
    # the spider will stop here and open up an interactive shell during the run
like image 115
Granitosaurus Avatar answered Jan 08 '23 13:01

Granitosaurus