How to disable robots.txt when you launch scrapy shell?

Question

I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site. How can I disable robots detection by Scrapy (ignored the existence)? Thank you in advance. I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'

daniboy000 · Accepted Answer

In the settings.py file of your scrapy project, look for ROBOTSTXT_OBEY and set it to False.

Granitosaurus · Answer

If you run scrapy from project directory scrapy shell will use the projects settings.py. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set flag.
So to turn off ROBOTSTXT_OBEY setting you can simply:

scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"

How to disable robots.txt when you launch scrapy shell?

Tags:

python

scrapy

web-crawler

robots.txt

scrapy-shell

DARDAR SAAD

2 Answers

daniboy000

Granitosaurus

Recent Activity

Donate For Us

How to disable robots.txt when you launch scrapy shell?

Tags:

python

scrapy

web-crawler

robots.txt

scrapy-shell

DARDAR SAAD

2 Answers

daniboy000

Granitosaurus

Related questions

Recent Activity

Donate For Us